Department of Computing
CS370: Artificial Intelligence
Class: BSCS-11C
Lab 01: Introduction to Python
Date: 15-09-2023
Lab Engineer: Ms Shakeela
Mahad Mohtashim
379889
BSCS-11-C
Page 1
Task #1
Write down a python program which takes two strings as input and calculate the
Levenshtein/Edit distance between the two strings.
Explanation:-
Levenshtein/Edit distance gives us a measure of similarity between two strings/sequences.
Going by formal definition it is minimum number of single character edits required to transform
one string into another.
Single character edits include:-
Insertion
Deletion
Substitution
Mathematically:-
Mathematically Levenshtein/Edit distance between two strings ‘a’ and ‘b’ is defined as:-
For further understanding of the formula you may read this blog as it explains it in great depth
or you may get back to me wherever/whenever you stuck.
https://medium.com/@ethannam/understanding-the-levenshtein-distance-equation-for-beginners-
c4285a5604f0
Page 2
But it does not explain how to count the edit operations while calculating overall Levenshtein
distance.
The output of your program should somewhat look like:-
Task #2
Now modify the above written program in such a way that it takes two text files containing
single- line and lowercase English sentences named as reference.txt and hypothesis.txt, and
outputs the file result.txt containing Levenshtein distance of these two files as below. The
distance should be word level and not character level.
Page 3
**********reference.txt***************
this is some text and we would like to see if it has been identified correctly by speech recognition system
***************************************
**********hypothesis.txt*************
this is a text and we would like to check what has been identified by the speech recognition
***************************************
*********result.txt*******************
Levenshtein distance is 7
Insertions 1
Deletions 3
Substitutions 3
***************************************
Hint:-
In this case we can treat words as characters in previous case, right?
Task #3
Now modify the above program so that it ignores 10 common words in such a way:-
Insertions and deletions involving these common words are ignored
Substitutions are ignored when both initial and final word are one of 10 common words
List of 10 common words:
the, of, and, a, be, this, there, an, been, some
Now the result2.txt should look like :-
Page 4
*********result2.txt*******************
Levenshtein distance is 5
Insertions 0
Deletions 3
Substitutions 2
***************************************
Submission Guidelines:-
Deliverables and Deadline:
Please add as per your convenience
Page 5