Ccs339 Text and Speech Analysis Lab Manual
Ccs339 Text and Speech Analysis Lab Manual
AIM:
To create Regular expressions in Python for detecting word patterns and tokenizing text.
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
1
• Execute the main function if the script is being run directly.
• Display the detected words and tokens in the example text.
PROGRAM:
import re
def detect_word_patterns(text):
# Regular expression pattern for detecting words
word_pattern = re.compile(r'\b\w+\b')
# Find all matches for the word pattern in the text
words = word_pattern.findall(text)
return words
def tokenize_text(text):
# Regular expression pattern for tokenizing text
token_pattern = re.compile(r'\b\w+\b|\s|[^\w\s]')
# Find all matches for the token pattern in the
text tokens = token_pattern.findall(text)
return tokens
def main():
# Example text
text = "Rajalakshmi Institute of Technology was established in 2008. RIT is accredited with
highest grade of A++ by NAAC. RIT is affiliated with Anna University Chennai. "
# Detect word patterns
words = detect_word_patterns(text)
2
print("Words:", words)
# Tokenize text
tokens = tokenize_text(text)
print("Tokens:", tokens)
if name == " main ":
main()
OUTPUT:
Words: ['Rajalakshmi', 'Institute', 'of', 'Technology', 'was', 'established', 'in', '2008', 'RIT', 'is',
'accredited', 'with', 'highest', 'grade', 'of', 'A', 'by', 'NAAC', 'RIT', 'is', 'affiliated', 'with', 'Anna',
'University', 'Chennai']
Tokens: ['Rajalakshmi', ' ', 'Institute', ' ', 'of', ' ', 'Technology', ' ', 'was', ' ', 'established', ' ', 'in', ' ',
'2008', '.', ' ', 'RIT', ' ', 'is', ' ', 'accredited', ' ', 'with', ' ', 'highest', ' ', 'grade', ' ', 'of', ' ', 'A', '+', '+', ' ',
'by', ' ', 'NAAC', '.', ' ', 'RIT', ' ', 'is', ' ', 'affiliated', ' ', 'with', ' ', 'Anna', ' ', 'University', ' ',
'Chennai', '.', ' ']
RESULT:
Therefore, the Python program for generating regular expressions to detect word patterns and
tokenize text is completed.
3
EXP NO: 2a
Searching Text
Date:
AIM:
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
• Import the NLTK module (nltk) and install it if not already installed.
• Define the example text.
• Use NLTK's regular expression (nltk.re) module to find all occurrences of the word
"text" in the example text.
• Tokenize the text into words using NLTK's word_tokenize function and convert
them to lowercase.
4
• Create an NLTK Text object from the tokenized text.
• Use the similar method of the Text object to find similar words to "text".
• Print the occurrences of "text" and the similar words.
PROGRAM:
OUTPUT:
5
lib\site-packages (from nltk) (1.2.0)
[nltk_data] C:\Users\User\AppData\Roaming\nltk_data...
['text',
'text']
None
RESULT:
The searching text program is executed and output got verified.
6
EXP NO:2b
Counting Vocabulary
Date:
AIM:
To create python program for Counting Vocabulary.
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
7
• Print the length of the vocabulary set.
• Display the number of unique words in the example text.
PROGRAM:
OUTPUT:
10
RESULT:
The Counting Vocabulary program is executed successfully and output got verified.
8
EXP NO:2c
Frequency Distribution
Date:
AIM:
To create a python program for Frequency Distribution.
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
• Import NLTK and download the necessary resources (e.g., punkt tokenizer).
• Import the FreqDist class from NLTK's probability module and the word_tokenize
function from the tokenize module.
• Define the example text.
• Tokenize the text into words using the word_tokenize function and convert them to
lowercase.
• Create a frequency distribution (FreqDist) for the tokens.
9
• Print the most common n words, where n is the desired number of most common words.
• Plot the frequency distribution of the tokens.
PROGRAM:
Import nltk
nltk.download('punkt')
from nltk.probability import FreqDist
from nltk.tokenize import word_tokenize
text = "This is a sample text with some repeated words."
tokens = word_tokenize(text.lower())
# Create a frequency distribution for the words
fdist = FreqDist(tokens)
# Print the most frequent words
print(fdist.most_common(3)) # Output: [('is', 2), ('this', 1), ('a', 1)]
# Plot the frequency distribution
fdist.plot(cumulative=False)
10
OUTPUT:
[nltk_data] Downloading package punkt to /root/nltk_data...
RESULT:
The frequency distribution program is executed successfully and output got verified.
11
EXP NO:2d
Collocations
Date:
AIM:
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
12
• Create a Bigram Collocation Finder from the tokenized words.
• Use the Nbest method to find the top n bigrams with the highest Pointwise Mutual
Information (PMI), where n is the desired number of top bigrams.
• Print the top n bigrams with the highest PMI.
PROGRAM:
OUTPUT:
[('an', 'exciting'),
('applications', '.'),
('exciting', 'field'), ('field', 'with'), ('is', 'an')]
RESULT:
The Collocations program executed successfully, and the output was verified.
13
EXP NO:2e
Bigrams
Date:
AIM:
To create a python program for Bigrams.
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
• Import NLTK and download the necessary resources (e.g., punkt tokenizer).
• Import the ngrams function from NLTK's util module and the word_tokenize function
from the tokenize module.
• Define the example text.
• Tokenize the text into words using the word_tokenize function and convert them to
lowercase.
• Generate bigrams (sequences of two consecutive words) from the tokenized words
using the ngrams function.
14
• Convert the bigrams generator into a list to display the bigrams.
• Print the list of generated bigrams.
PROGRAM:
import nltk
nltk.download('punkt')
from nltk.util import ngrams
from nltk.tokenize import word_tokenize
text = "This is a sample text to demonstrate bigrams."
tokens = word_tokenize(text.lower())
# Generate bigrams (sequences of two consecutive words)
bigrams = ngrams(tokens, 2)
print(list(bigrams))
OUTPUT:
[('this', 'is'), ('is', 'a'), ('a', 'sample'), ('sample', 'text'), ('text', 'to'), ('to', 'demonstrate'),
('demonstrate', 'bigrams'), ('bigrams', '.')]
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
RESULT:
15
EXP NO:3
Accessing Text Corpora using NLTK in Python
Date:
AIM:
To create a python program for Accessing Text Corpora using NLTK in Python
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
16
Gutenberg corpus.
• List available files in the Gutenberg corpus using the fileids method.
• Access and print the text of a specific document in the corpus (in this case,
"shakespeare- hamlet.txt").
• Call the access_gutenberg_corpus function from the main function.
PROGRAM:
17
if name_== " main ":
main()
OUTPUT:
18
Cornelius, courtier.
Rosencrantz, courtier.
Guildenstern, courtier.
Osric, courtier.
A Gentleman, courtier.
A Priest.
Marcellus, officer.
Bernardo, off...
[nltk_data] Downloading package gutenberg to /root/nltk_data...
[nltk_data] Package gutenberg is already up-to-date!
RESULT:
The Accessing Text Corpora using NLTK in Python program is executed successfully and
output got verified.
19
EXP NO:4 Write a Function that Finds The 50 Most Frequently
Date: Occurring Words of a Text Words.
AIM:
To create python program for creating a function that finds the 50 most frequently occurring
words of a text words.
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
20
• Import the necessary modules (stopwords, FreqDist) from NLTK.
• Define the find_frequent_words function to find the most frequent words in the text.
• Load English stopwords using stopwords.words('english').
• Tokenize the text into words using nltk.word_tokenize and convert them to lowercase.
• Filter out stopwords from the tokenized words, create a frequency distribution using
FreqDist, and return the most common words.
PROGRAM:
# Example usage:
text = "This is a sample text with some common words and some less common words."
frequent_words = find_frequent_words(text)
print(frequent_words)
21
OUTPUT:
[('common', 2), ('words', 2), ('sample', 1), ('text', 1), ('less', 1), ('.',
1)] [nltk_data] Downloading package stopwords to
/root/nltk_data... [nltk_data] Unzipping corpora/stopwords.zip.
RESULT:
The Write a function that finds the 50 most frequently occurring words of a text words program
is executed successfully and output got verified.
22
EXP NO:5
Implement the Word2Vec Mode
Date:
AIM:
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
23
• Define the sample sentences.
• Tokenize the sentences into words using word_tokenize, and convert them to lowercase.
• Set up and train the Word2Vec model with specified parameters (vector_size=100,
window=5, min_count=1, workers=4).
• Save the trained Word2Vec model to a file named "word2vec_model_sentences.bin", and
load the saved model.
PROGRAM:
24
# Save the trained model to a file
model.save("word2vec_model_sentences.bin")
loaded_model = Word2Vec.load("word2vec_model_sentences.bin")
word_embedding = loaded_model.wv['engineering']
OUTPUT:
25
-6.7476230e-04 2.9767421e-03 -6.1080824e-03 1.6988355e-03
-6.9264821e-03 -8.6941104e-03 -5.9012529e-03 -8.9566577e-03
7.2771502e-03 -5.7719820e-03 8.2766823e-03 -7.2425883e-03
3.4219360e-03 9.6746497e-03 -7.7848872e-03 -9.9454839e-03
-4.3290602e-03 -2.6821969e-03 -2.7132613e-04 -8.8319331e-03
-8.6167511e-03 2.7997096e-03 -8.2072075e-03 -9.0692798e-03
-2.3409016e-03 -8.6309426e-03 -7.0565986e-03 -8.4008174e-03
-3.0119700e-04 -4.5645908e-03 6.6272104e-03 1.5276786e-03
-3.3420518e-03 6.1100693e-03 -6.0128779e-03 -4.6551023e-03
-7.2083715e-03 -4.3364055e-03 -1.8094820e-03 6.4903200e-03
-2.7698609e-03 4.9190638e-03 6.9043743e-03 -
7.4632545e-03 4.5653125e-03 6.1272969e-03 -2.9546837e-
03 6.6242618e-03
6.1250199e-03 -6.4425734e-03 -6.7656934e-03 2.5390687e-03
-1.6231104e-03 -6.0651163e-03 9.4992034e-03 -5.1304861e-03
-6.5529565e-03 -1.1961181e-04 -2.7010120e-03 4.4384925e-04
-3.5381056e-03 -4.1872010e-04 -7.0809841e-04 8.2218763e-
04 8.1943199e-03 -5.7367464e-03 -1.6597145e-03
5.5715367e-03]
[nltk_data] Downloading package punkt to /root/nltk_data...
[nltk_data] Package punkt is already up-to-date!
RESULT:
The implementation of the Word2Vec Mode program is executed successfully and Output got
verified.
26
EXP NO:6
Use a Transformer for Implementing Classification
Date:
AIM:
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
27
• Split the data into training and testing sets using train_test_split from
sklearn.model_selection.
• Load the pre-trained BERT model and tokenizer using BertTokenizer.from_pretrained
and BertForSequenceClassification.from_pretrained from transformers.
• Tokenize and encode the training and testing data using the BERT tokenizer.
• Define the optimizer, loss function, and set up the training loop using PyTorch.
PROGRAM:
28
# Load pre-trained BERT model and tokenizer
tokenizer = BertTokenizer.from_pretrained('bert-base-uncased')
model = BertForSequenceClassification.from_pretrained('bert-base-uncased', num_labels=2)
# Tokenize and encode the training data
29
model.train()
for batch in tqdm(train_dataloader, desc=f"Epoch {epoch + 1}"):
input_ids, attention_mask, labels = batch
input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device),
labels.to(device)
optimizer.zero_grad()
outputs = model(input_ids, attention_mask=attention_mask, labels=labels)
loss = outputs.loss
loss.backward()
optimizer.step()
# Evaluation
model.eval()
predictions = []
true_labels = []
with torch.no_grad():
for batch in tqdm(test_dataloader, desc="Evaluating"):
input_ids, attention_mask, labels = batch
input_ids, attention_mask, labels = input_ids.to(device), attention_mask.to(device),
labels.to(device)
outputs = model(input_ids, attention_mask=attention_mask)
logits = outputs.logits
predicted_labels = torch.argmax(logits, dim=1).cpu().numpy()
predictions.extend(predicted_labels)
true_labels.extend(labels.cpu().numpy())
30
# Calculate accuracy
accuracy = accuracy_score(true_labels, predictions)
print(f"Accuracy: {accuracy * 100:.2f}%")
OUTPUT:
31
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 731.7/731.7 MB 987.8 kB/s eta
0:00:00
Collecting nvidia-cublas-cu12==12.1.3.1 (from torch)
Downloading nvidia_cublas_cu12-12.1.3.1-py3-none-manylinux1_x86_64.whl (410.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 410.6/410.6 MB 1.2 MB/s eta
0:00:00
Collecting nvidia-cufft-cu12==11.0.2.54 (from torch)
Downloading nvidia_cufft_cu12-11.0.2.54-py3-none-manylinux1_x86_64.whl (121.6 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 121.6/121.6 MB 8.3 MB/s eta
0:00:00
Collecting nvidia-curand-cu12==10.3.2.106 (from torch)
Downloading nvidia_curand_cu12-10.3.2.106-py3-none-manylinux1_x86_64.whl (56.5 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 56.5/56.5 MB 11.4 MB/s eta 0:00:00
Collecting nvidia-cusolver-cu12==11.4.5.107 (from torch)
Downloading nvidia_cusolver_cu12-11.4.5.107-py3-none-manylinux1_x86_64.whl (124.2 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 124.2/124.2 MB 6.2 MB/s eta
0:00:00
Collecting nvidia-cusparse-cu12==12.1.0.106 (from torch)
Downloading nvidia_cusparse_cu12-12.1.0.106-py3-none-manylinux1_x86_64.whl (196.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 196.0/196.0 MB 2.4 MB/s eta
0:00:00
Collecting nvidia-nccl-cu12==2.19.3 (from torch)
Downloading nvidia_nccl_cu12-2.19.3-py3-none-manylinux1_x86_64.whl (166.0 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 166.0/166.0 MB 2.3 MB/s eta
0:00:00
Collecting nvidia-nvtx-cu12==12.1.105 (from torch)
Downloading nvidia_nvtx_cu12-12.1.105-py3-none-manylinux1_x86_64.whl (99 kB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 99.1/99.1 kB 11.1 MB/s eta 0:00:00
Requirement already satisfied: triton==2.2.0 in /usr/local/lib/python3.10/dist-packages (from torch)
(2.2.0)
32
Collecting nvidia-nvjitlink-cu12 (from nvidia-cusolver-cu12==11.4.5.107->torch)
Downloading nvidia_nvjitlink_cu12-12.4.127-py3-none-manylinux2014_x86_64.whl (21.1 MB)
━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━ 21.1/21.1 MB 45.3 MB/s eta 0:00:00
Requirement already satisfied: MarkupSafe>=2.0 in /usr/local/lib/python3.10/dist-packages (from
jinja2->torch) (2.1.5)
Requirement already satisfied: mpmath>=0.19 in /usr/local/lib/python3.10/dist-packages (from
sympy->torch) (1.3.0)
Installing collected packages: nvidia-nvtx-cu12, nvidia-nvjitlink-cu12, nvidia-nccl-cu12, nvidia-
curand-cu12, nvidia-cufft-cu12, nvidia-cuda-runtime-cu12, nvidia-cuda-nvrtc-cu12, nvidia-cuda-
cupti-cu12, nvidia-cublas-cu12, nvidia-cusparse-cu12, nvidia-cudnn-cu12, nvidia-cusolver-cu12
Successfully installed nvidia-cublas-cu12-12.1.3.1 nvidia-cuda-cupti-cu12-12.1.105 nvidia-cuda-
nvrtc-cu12-12.1.105 nvidia-cuda-runtime-cu12-12.1.105 nvidia-cudnn-cu12-8.9.2.26 nvidia-cufft-
cu12-11.0.2.54 nvidia-curand-cu12-10.3.2.106 nvidia-cusolver-cu12-11.4.5.107 nvidia-cusparse-
cu12-12.1.0.106 nvidia-nccl-cu12-2.19.3 nvidia-nvjitlink-cu12-12.4.127 nvidia-nvtx-cu12-12.1.105
Requirement already satisfied: transformers in /usr/local/lib/python3.10/dist-packages (4.38.2)
Requirement already satisfied: filelock in /usr/local/lib/python3.10/dist-packages (from
transformers) (3.13.3)
Requirement already satisfied: huggingface-hub<1.0,>=0.19.3 in /usr/local/lib/python3.10/dist-
packages (from transformers) (0.20.3)
Requirement already satisfied: numpy>=1.17 in /usr/local/lib/python3.10/dist-packages (from
transformers) (1.25.2)
Requirement already satisfied: packaging>=20.0 in /usr/local/lib/python3.10/dist-packages (from
transformers) (24.0)
Requirement already satisfied: pyyaml>=5.1 in /usr/local/lib/python3.10/dist-packages (from
transformers) (6.0.1)
Requirement already satisfied: regex!=2019.12.17 in /usr/local/lib/python3.10/dist-packages (from
transformers) (2023.12.25)
Requirement already satisfied: requests in /usr/local/lib/python3.10/dist-packages (from
transformers) (2.31.0)
Requirement already satisfied: tokenizers<0.19,>=0.14 in /usr/local/lib/python3.10/dist-packages
33
(from transformers) (0.15.2)
Requirement already satisfied: safetensors>=0.4.1 in /usr/local/lib/python3.10/dist-packages (from
transformers) (0.4.2)
Requirement already satisfied: tqdm>=4.27 in /usr/local/lib/python3.10/dist-packages (from
transformers) (4.66.2)
Requirement already satisfied: fsspec>=2023.5.0 in /usr/local/lib/python3.10/dist-packages (from
huggingface-hub<1.0,>=0.19.3->transformers) (2023.6.0)
Requirement already satisfied: typing-extensions>=3.7.4.3 in /usr/local/lib/python3.10/dist-packages
(from huggingface-hub<1.0,>=0.19.3->transformers) (4.10.0)
Requirement already satisfied: charset-normalizer<4,>=2 in /usr/local/lib/python3.10/dist-packages
(from requests->transformers) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in /usr/local/lib/python3.10/dist-packages (from
requests->transformers) (3.6)
Requirement already satisfied: urllib3<3,>=1.21.1 in /usr/local/lib/python3.10/dist-packages (from
requests->transformers) (2.0.7)
Requirement already satisfied: certifi>=2017.4.17 in /usr/local/lib/python3.10/dist-packages (from
requests->transformers) (2024.2.2)
Collecting sklearn
Downloading sklearn-0.0.post12.tar.gz (2.6 kB)
error: subprocess-exited-with-error
note: This error originates from a subprocess, and is likely not a problem with pip.
Preparing metadata (setup.py) ... error
error: metadata-generation-failed
34
╰─> See above for output.
note: This is an issue with the package mentioned above, not pip.
hint: See above for details.
Requirement already satisfied: tqdm in /usr/local/lib/python3.10/dist-packages (4.66.2)
/usr/local/lib/python3.10/dist-packages/huggingface_hub/utils/_token.py:88: UserWarning:
The secret `HF_TOKEN` does not exist in your Colab secrets.
To authenticate with the Hugging Face Hub, create a token in your settings tab
(https://huggingface.co/settings/tokens), set it as secret in your Google Colab and restart your
session.
You will be able to reuse this secret in all of your notebooks.
Please note that authentication is recommended but still optional to access public models or datasets.
warnings.warn(
tokenizer_config.json: 100%
48.0/48.0 [00:00<00:00, 1.74kB/s]
vocab.txt: 100%
232k/232k [00:00<00:00, 6.39MB/s]
tokenizer.json: 100%
466k/466k [00:00<00:00, 19.9MB/s]
config.json: 100%
570/570 [00:00<00:00, 21.2kB/s]
model.safetensors: 100%
440M/440M [00:01<00:00, 240MB/s]
Some weights of BertForSequenceClassification were not initialized from the model checkpoint at
bert-base-uncased and are newly initialized: ['classifier.bias', 'classifier.weight']
You should probably TRAIN this model on a down-stream task to be able to use it for predictions
and inference.
Epoch 1: 100%|██████████| 2/2 [00:04<00:00, 2.31s/it]
Epoch 2: 100%|██████████| 2/2 [00:02<00:00, 1.38s/it]
Epoch 3: 100%|██████████| 2/2 [00:02<00:00, 1.40s/it]
35
Evaluating: 100%|██████████| 1/1 [00:00<00:00, 7.83it/s]Accuracy: 0.00%
RESULT:
36
EXP NO:7 Design a Chatbot with a Simple Dialog System
Date:
AIM:
To create a python python for design a Chatbot with a simple dialog system.
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
37
• Define a method get_response to generate responses based on user input.
• Define a main function to interact with the chatbot.
• Create an instance of the Simple Chabot class.
• Start a conversation loop where the user can input questions or statements.
• Provide appropriate responses based on user input, including handling greetings,
goodbyes, and predefined queries.
PROGRAM:
import random
class SimpleChatbot:
def init (self):
self.greetings = ['hello', 'hi', 'hey', 'greetings', 'howdy']
self.goodbyes = ['bye', 'goodbye', 'see you', 'farewell']
self.responses = {
'tell me a joke': 'Why did the chicken cross the road? To get to the other
side!', 'how are you': 'I am just a computer program, but thanks for
asking!',
'default': 'I\'m sorry, I don\'t understand that. Can you ask me something else?'
}
def get_response(self, user_input):
user_input = user_input.lower()
if any(greeting in user_input for greeting in self.greetings):
return 'Hello! How can I help you today?'
elif any(goodbye in user_input for goodbye in self.goodbyes):
return 'Goodbye! Have a great day.'
38
else:
for key in self.responses:
if key in user_input:
return self.responses[key]
return self.responses['default']
def main():
chatbot = SimpleChatbot()
print("Simple Chatbot: Hello! Ask me anything or say goodbye to end the conversation.")
while True:
user_input = input("You: ")
if user_input.lower() in ['bye', 'goodbye', 'exit']:
print("Simple Chatbot: Goodbye! Have a great day.")
break
response =chatbot.get_response(user_input)
print("Simple Chatbot:", response)
39
OUTPUT:
RESULT:
The Design of a chatbot with a simple dialog system program is executed successfully and
Output got verified.
40
EXP NO:8
Convert Text to Speech and Find accuracy
Date:
AIM:
To create a python program for converting text to speech and find accuracy.
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
41
• Define a function speech_to_text to recognize speech using the microphone and Google's
speech recognition service.
• Define a function evaluate_accuracy to compare the original text with the recognized
text and calculate accuracy.
• Execute the text-to-speech conversion for the original text.
• Use the microphone to capture speech input, recognize it, and evaluate the
accuracy of the recognition.
PROGRAM:
import speech_recognition as sr
from gtts import gTTS
import os
def text_to_speech(text, language='en'):
42
try:
text = recognizer.recognize_google(audio)
return text
except sr.UnknownValueError:
print("Sorry, could not understand audio.")
return None
except sr.RequestError as e:
print(f"Could not request results from Google Speech Recognition service; {e}")
return None
def evaluate_accuracy(original_text, recognized_text):
if recognized_text:
print(f"Original Text: {original_text}")
print(f"Recognized Text: {recognized_text}")
original_words = set(original_text.lower().split())
recognized_words = set(recognized_text.lower().split())
common_words = original_words.intersection(recognized_words)
accuracy = len(common_words) / len(original_words)
print(f"Accuracy: {accuracy * 100:.2f}%")
else:
print("No text recognized. Accuracy cannot be calculated.")
if name_== " main ":
43
# Speech to text
recognized_text = speech_to_text()
# Evaluate accuracy
evaluate_accuracy(original_text, recognized_text)
OUTPUT:
44
Downloading click-8.1.7-py3-none-any.whl (97 kB)
0.0/97.9 kB ? eta -:--:--
------------------------- 61.4/97.9 kB 1.1 MB/s eta 0:00:01
- ----------------------------------- - 97.9/97.9 kB 933.0 kB/s eta 0:00:00
Requirement already satisfied: colorama in c:\users\user\appdata\local\programs\python\
python311\lib\site-packages (from click<8.2,>=7.1->gTTS) (0.4.6)
Requirement already satisfied: charset-normalizer<4,>=2 in c:\users\user\appdata\local\programs\
python\python311\lib\site-packages (from requests<3,>=2.27->gTTS) (3.3.2)
Requirement already satisfied: idna<4,>=2.5 in c:\users\user\appdata\local\programs\python\
python311\lib\site-packages (from requests<3,>=2.27->gTTS) (3.4)
Requirement already satisfied: urllib3<3,>=1.21.1 in c:\users\user\appdata\local\programs\python\
python311\lib\site-packages (from requests<3,>=2.27->gTTS) (2.1.0)
Requirement already satisfied: certifi>=2017.4.17 in c:\users\user\appdata\local\programs\python\
python311\lib\site-packages (from requests<3,>=2.27->gTTS) (2023.7.22)
Installing collected packages: click, gTTS
Successfully installed click-8.1.7 gTTS-2.5.0
45
Original Text: Hello, how are you today?
Recognized Text: hello how are you today
Accuracy: 60.00%
RESULT:
The Conversion of text to speech and find the accuracy program is executed successfully
and Output got verified.
46
EXP NO:9
Design a speech recognition system and find the error rate
Date:
AIM:
To create a python program for design a speech recognition system and find the error rate.
SOFTWARE SPECIFICATIONS:
• Anaconda Navigator
• Jupyter Notebook
• Google Colab
HARDWARE SPECIFICATIONS:
• Windows 10/11
• RAM - 16 GB
• Hard-disk - 1 TB
• Processor - Intel i5/i7
ALGORITHM:
47
speech recognition service.
• Define a function calculate_word_error_rate to calculate the WER between a reference
text and recognized text.
• Simulate a reference text and provide the path to the audio file containing the recognized
speech.
• Use the recognize_speech function to get the recognized text from the audio file.
• Calculate the Word Error Rate (WER) between the reference text and the recognized
text using the calculate_word_error_rate function.
PROGRAM:
48
def calculate_word_error_rate(reference_text, recognized_text):
wer = jiwer.wer(reference_text, recognized_text)
return wer
if name _== " main ":
49
OUTPUT:
RESULT:
The Design of a speech recognition system and find the error rate program is
50
51