0% found this document useful (0 votes)

47 views14 pages

Computational Linguistics Overview

Uploaded by

sakshisehrawat1311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

47 views14 pages

Computational Linguistics Overview

Uploaded by

sakshisehrawat1311

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

What is computational linguistics (CL)?

Computational linguistics (CL) is the application of computer science to the

analysis and comprehension of written and spoken language. As an
interdisciplinary field, CL combines linguistics with computer science and
artificial intelligence (AI) and is concerned with understanding language from a
computational perspective. Computers that are linguistically competent help
facilitate human interaction with machines and software.

Computational linguistics is used in tools such as instant machine

translation, speech recognition systems, parsers, text-to-speech
synthesizers, interactive voice response systems, search engines, text editors and
language instruction materials.

The term computational linguistics is also closely linked to natural language

processing (NLP), and these two terms are often used interchangeably.

Applications of computational linguistics

Most work in computational linguistics -- which has both theoretical and applied
elements -- is aimed at improving the relationship between computers and basic
language. It involves building artifacts that can be used to process and produce
language. Building such artifacts requires data scientists to analyze massive
amounts of written and spoken language in
both structured and unstructured formats.

Applications of CL typically include the following:

 Machine translation. This is the process of using AI to

translate one human language to another.

 Application clustering. This is the process of turning multiple

computer servers into a cluster.

 Sentiment analysis. Sentiment analysis is an important approach

to NLP that identifies the emotional tone behind a body of text.
 Chatbots. These software or computer programs simulate human
conversation or chatter through text or voice interactions.

 Information extraction. This is the creation of knowledge from

structured and unstructured text.

 Natural language interfaces. These are computer-human

interfaces where words, phrases or clauses act as user interface
controls.

 Content filtering. This process blocks various language-based

web content from reaching users.

 Text mining. Text mining is the process of extracting useful

information from massive amounts of unstructured textual
data. Tokenization, part-of-speech tagging -- named entity
recognition and sentiment analysis -- are used to accomplish this
process.

Approaches and methods of computational

linguistics
There have been many different approaches and methods of computational
linguistics since its beginning in the 1950s. Examples of some CL
approaches include the following:

 The corpus-based approach, which is based on the language as

it's practically used.

 The comprehension approach, which enables the NLP engine to

interpret naturally written commands in a simple rule-governed
environment.

 The developmental approach, which adopts the language

acquisition strategy of a child by acquiring language over time.
The developmental process has a statistical approach to studying
language and doesn't take grammatical structure into account.
 The structural approach, which takes a theoretical approach to
the structure of a language. This approach uses large samples of
a language run through computational models to gain a better
understanding of the underlying language structures.

 The production approach focuses on a CL model to produce text.

This has been done in a number of ways, including the
construction of algorithms that produce text based on example
texts from humans. This approach can be broken down into the
following two approaches:

 The text-based interactive approach uses text from a

human to generate a response by an algorithm. A
computer can recognize different patterns and reply
based on user input and specified keywords.

 The speech-based interactive approach works similarly

to the text-based approach, but user input is made
through speech recognition. The user's speech input is
recognized as sound waves and is interpreted as
patterns by the CL system.

CL vs. NLP
Computational linguistics and natural language processing are similar
concepts, as both fields require formal training in computer science,
linguistics and machine learning (ML). Both use the same tools, such as
ML and AI, to accomplish their goals and many NLP tasks need an
understanding or interpretation of language.

NLP plays an important role in creating language technologies, including

chatbots, speech recognition systems and virtual assistants, such as Siri,
Alexa and Cortana. Meanwhile, CL lends its expertise to topics such as
preserving languages, analyzing historical documents and building
dialogue systems, such as Google Translate.
Levels/ Stages of Natural Language Processing

The process of Natural Language Processing is divided into 5 major stages or phases,
starting from basic word-level processing up to finding complex meanings of
sentences.

1. Morphological Analysis/ Lexical Analysis

Morphological or Lexical Analysis deals with text at the individual word level.
It looks for morphemes, the smallest unit of a word. For
example, irrationally can be broken into ir (prefix), rational (root) and -
ly (suffix). Lexical Analysis finds the relation between these morphemes and
converts the word into its root form. A lexical analyzer also assigns the
possible Part-Of-Speech (POS) to the word. It takes into consideration the
dictionary of the language.
For example, the word “character” can be used as a noun or a verb.

2. Syntax Analysis
Syntax Analysis ensures that a given piece of text is correct structure. It tries to
parse the sentence to check correct grammar at the sentence level. Given the
possible POS generated from the previous step, a syntax analyzer assigns POS
tags based on the sentence structure.

For example:

Correct Syntax: Sun rises in the east.

Incorrect Syntax: Rise in sun the east.

3.Semantic Analysis
Consider the sentence: “The apple ate a banana”. Although the sentence is
syntactically correct, it doesn’t make sense because apples can’t eat. Semantic
analysis looks for meaning in the given sentence. It also deals with combining
words into phrases.

For example, “red apple” provides information regarding one object; hence we
treat it as a single phrase. Similarly, we can group names referring to the same
category, person, object or organisation. “Robert Hill” refers to the same
person and not two separate names – “Robert” and “Hill”

4. Discourse
Discourse deals with the effect of a previous sentence on the sentence in
consideration. In the text, “Albert is a bright student. He spends most of the
time in the library.” Here, discourse assigns “he” to refer to “Albert”.

5. Pragmatics
The final stage of NLP, Pragmatics interprets the given text using information
from the previous steps. Given a sentence, “Turn off the lights” is an order or
request to switch off the lights.
Tokenization in Natural Language Processing

To let machines understand the natural language, we first need to divide the input text
in smaller chunks. Breaking paragraphs into sentences and then into individual words
can help machines interpret meanings easily. This is where the concept of tokenization
comes in Natural Language Processing.

Tokenization
Tokenization is one of the most common tasks in text processing. It is the process of
separating a given text into smaller units called tokens.

An input text is a group of multiple words which make a sentence. We need to break
the text in such a way that machines can understand this text and tokenization helps us
to achieve that.

It can be classified into 2 types:

1. Sentence Tokenization
Sentence tokenization is the process of dividing the text into its component
sentence. The method is very simple. In layman’s term: split the sentences
wherever there is an end-of-sentence punctuation mark. For example, the
English language has 3 punctuations that indicate the end of a
sentence: !, . and ?. Similarly, other languages have different closing
punctuations.

While we can manually break sentences on these punctuations, python’s NLTK

library provides us with the necessary tools and we need not worry about
splitting sentences.
2. Word Tokenization
Word tokenization is the process of dividing a text into its component word.
We need to split the text after every space is seen. Also, we need to take care of
punctuation marks. It is easier to deal with individual words than to deal with a
sentence. Thus, we need to further tokenize sentences into words.

Stopwords
Stop words refers to common words in a language. These are words that do not
contain major information but are necessary for making the sentence complete. Some
examples of stop words are “in”, “the”, “is”, “an”, etc. We can safely ignore these
words without losing the meaning of the content.
Stemming
Stemming refers to the crude chopping of words to reduce into their stem words. A
Stemmer follows a set of pre-defined rules to remove affixes from inflectional words.
For example: connects, connected, connection can be converted to connect.

Porter’s Stemmer
There are multiple stemming algorithms to chose from, Porter’s Stemmer being one of
the most used. NLTK provides this algorithm as Porter Stemmer. To use this stemmer,
we need to download it through Python Shell
Lemmatization
Lemmatization is similar to Stemming, however, a Lemmatizer always returns a valid
word. Stemming uses rules to cut the word, whereas a Lemmatizer searched for the
root word, also called as Lemma from the WordNet. Moreover, lemmatization takes
care of converting a word into its base form; i.e. words like am, is, are will be
converted to “be”.

WordNetLemmatizer
Again, NLTK provides a WordNetLemmatizer to use off-the-shelf. However, this
requires the POS tags of the word for correct results. For now, we manually provide
the POS tags.
Stemming vs Lemmatization
Now that we know what Stemming and Lemmatization are, one may ask why to use
Stemming at all if Lemmatization provides correct results?

A Stemmer is very fast in comparison to Lemmatization. Moreover, Lemmatization

requires POS tags to perform correctly. In our example, we manually provided the
POS tags. Although when dealing in an application, we need to perform this POS
tagging. Then, each word is searched for its base form from the WordNet. This
increases the computation time and may not be optimal.

In some cases, it might be better to use a Stemmer than to wait for Lemmatization.
Whereas, if precision is important in an application, one can use Lemmatization over
Stemming.
Part Of Speech Tagging – POS Tagging in NLP

As discussed in Stages of Natural Language Processing, Syntax Analysis deals with

the arrangement of words to form a structure that makes grammatical sense. A
sentence is syntactically correct when the Parts of Speech of the sentence follow the
rules of grammar. To achieve this, the given sentence structure is compared with the
common language rules.

Part of Speech
Part of Speech is the classification of words based on their role in the sentence. The
major POS tags are Nouns, Verbs, Adjectives, Adverbs. This category provides more
details about the word and its meaning in the context. A sentence consists of words
with a sensible Part of Speech structure.

For example: Book the flight!

This sentence contains Noun (Book), Determinant (the) and a Verb (flight).

Part Of Speech Tagging

POS tagging refers to the automatic assignment of a tag to words in a given sentence.
It converts a sentence into a list of words with their tags. (word, tag). Since this task
involves considering the sentence structure, it cannot be done at the Lexical level. A
POS tagger considers surrounding words while assigning a tag.
For example, the previous sentence, “Book the flight”, will become a list of each word
with its corresponding POS tag – [(“Book”, “Verb”), (“the”, “Det”), (“flight”,
“Noun”)].

Similarly, “I like to read book” is represented as: [(“I”, “Preposition”), (“like”,

“Verb”), (“to”, “To”), (“read”, “Verb”), (“books”, “Noun”)]. Notice how the
word Book appears in both sentences. However, in the first example, it acts as
a Verb but takes the role of a Noun in the latter.

Although we are using the generic names of the tags, in real practice, we refer a tagset
for tags. The Penn TreeBank Tag Set is most used for the English language. Some
examples from Penn Treebank:

Part Of Speech Tag

Noun (Singular) NN

Noun (Plural) NNS

Verb VB

Determiner DT

Examples of Penn Treebank Tags

Difficulties in POS Tagging

Similar to most NLP problems, POS tagging suffers from ambiguity. In the sentences,
“Book the flight” and “I like to read books”, we see that book can act as a Verb or
Noun. Similarly, many words in the English dictionary has multiple possible POS
tags.

 This (Preposition) is a car

 This (Determiner) car is red
 You can go this (Adverb) far only.

These sentences use the word “This” in various contexts. However, how can one
assign the correct tag to the words?
POS Tagging Approaches
1. Rule-Based POS Tag
This is one of the oldest approaches to POS tagging. It involves using a
dictionary consisting of all the possible POS tags for a given word. If any of the
words have more than one tag, hand-written rules are used to assign the correct
tag based on the tags of surrounding words.

For example, if the preceding of a word an article, then the word has to be a
noun.

Consider the words: A Book

o Get all the possible POS tags for individual words: A – Article; Book
– Noun or Verb
o Use the rules to assign the correct POS tag: As per the possible tags,
“A” is an Article and we can assign it directly. But, a book can either be
a Noun or a Verb. However, if we consider “A Book”, A is an article
and following our rule above, Book has to be a Noun. Thus, we assign
the tag of Noun to book.

POS Tag: [(“A”, “Article”), (“Book”, “Noun”)]

Similarly, various rules are written or machine-learned for other cases. Using
these rules, it is possible to build a Rule-based POS tagger.

2. Stochastic Tagger
A Stochastic Tagger, a supervised model, involves using with frequencies or
probabilities of the tags in the given training corpus to assign a tag to a new
word. These taggers entirely rely on statistics of the tag occurrence, i.e.
probability of the tags.

Based on the words used for determining a tag, Stochastic Taggers are divided
into 2 parts:

o Word Frequency: In this approach, we find the tag that is most

assigned to the word. For example: Given a training corpus, “book”
occurs 10 times – 6 times as Noun, 4 times as a Verb; the word book will
always be assigned as “Noun” since it occurs the most in the training
set. Hence, a Word Frequency Approach is not very reliable.
o Tag Sequence Frequency: Here, the best tag for a word is determined
using the probability the tags of N previous words, i.e. it considers the
tags for the words preceding book. Although this approach provides
better results than a Word Frequency Approach, it may still not provide
accurate results for rare structures. Tag Sequence Frequency is also
referred to as the N-gram approach.

Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
9 pages
IsiZulu HL P2 Nov 2023
50% (4)
IsiZulu HL P2 Nov 2023
32 pages
The Definition of (لعفلا) the verb
No ratings yet
The Definition of (لعفلا) the verb
7 pages
Introduction
No ratings yet
Introduction
24 pages
Form 1 Lesson 107 Action Oriented Task
No ratings yet
Form 1 Lesson 107 Action Oriented Task
1 page
Unit-4 NLP
No ratings yet
Unit-4 NLP
54 pages
Classroom Exercises and Names
No ratings yet
Classroom Exercises and Names
29 pages
Grammar Test Review
No ratings yet
Grammar Test Review
5 pages
Hiligaynon Language Guide
100% (1)
Hiligaynon Language Guide
6 pages
Unit V Natural Language Processing
No ratings yet
Unit V Natural Language Processing
20 pages
Natural Language Processing
No ratings yet
Natural Language Processing
5 pages
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
No ratings yet
Natural Language Processing Notes by Prof. Suresh R. Mestry: L I L L L I
41 pages
Đề Cuong Ôn Tập Học Kỳ II Lớp 8
No ratings yet
Đề Cuong Ôn Tập Học Kỳ II Lớp 8
10 pages
TCS Digital Placement Papers PDF
100% (1)
TCS Digital Placement Papers PDF
22 pages
English Communication Skills (EN30225) Course Syllabus
No ratings yet
English Communication Skills (EN30225) Course Syllabus
8 pages
Unit 1
No ratings yet
Unit 1
99 pages
(Suarez, F, Kronen, J) On The Formal Cause of Substance
No ratings yet
(Suarez, F, Kronen, J) On The Formal Cause of Substance
221 pages
Itl Week 2 1
No ratings yet
Itl Week 2 1
6 pages
Theoretical Phonetics Test Questions
No ratings yet
Theoretical Phonetics Test Questions
5 pages
Rationale For Lesson Design (Integers)
No ratings yet
Rationale For Lesson Design (Integers)
9 pages
Introduction To Natural Language Processing
No ratings yet
Introduction To Natural Language Processing
32 pages
Natural Language Processing Tools and Approaches
No ratings yet
Natural Language Processing Tools and Approaches
106 pages
NLP Module1-4
No ratings yet
NLP Module1-4
100 pages
NLP Important Question and Answers Module Wise
No ratings yet
NLP Important Question and Answers Module Wise
101 pages
Common Core Deconstructed Standards For Writing
No ratings yet
Common Core Deconstructed Standards For Writing
7 pages
Led Cube To Assist Dyslexic Child by Measuring Speech Frequency
No ratings yet
Led Cube To Assist Dyslexic Child by Measuring Speech Frequency
3 pages
Unit 4
No ratings yet
Unit 4
39 pages
18 Computational Linguistics
No ratings yet
18 Computational Linguistics
5 pages
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
100% (8)
An In-Depth Exploration of Natural Language Processing: Evolution, Applications, and Future Directions
5 pages
NLP Notes2
No ratings yet
NLP Notes2
27 pages
Ai Applications Unit-1
No ratings yet
Ai Applications Unit-1
11 pages
KV Class 9 ENGLISH Annual Exam Sample Question Paper
No ratings yet
KV Class 9 ENGLISH Annual Exam Sample Question Paper
3 pages
Natural Language Processin1
No ratings yet
Natural Language Processin1
86 pages
UNIT-5 Quetions - Answers
No ratings yet
UNIT-5 Quetions - Answers
10 pages
NLP Lab1
No ratings yet
NLP Lab1
33 pages
NLP Components and Techniques Guide
No ratings yet
NLP Components and Techniques Guide
26 pages
Chapter 6 Natural Language Processing
No ratings yet
Chapter 6 Natural Language Processing
6 pages
Natural Language Processing (NLP)
No ratings yet
Natural Language Processing (NLP)
63 pages
Introduction To NLP
No ratings yet
Introduction To NLP
51 pages
NLP Exam Notes
No ratings yet
NLP Exam Notes
15 pages
Natural Language Processing
No ratings yet
Natural Language Processing
4 pages
NLP Unit 1 Part1
No ratings yet
NLP Unit 1 Part1
61 pages
Seminar Report1
No ratings yet
Seminar Report1
17 pages
Unit V
No ratings yet
Unit V
16 pages
NLP
No ratings yet
NLP
21 pages
Natural Language Processing State of The Art Curre
No ratings yet
Natural Language Processing State of The Art Curre
33 pages
Chapter - 1
No ratings yet
Chapter - 1
25 pages
Poeter Stemmer Algorithm
No ratings yet
Poeter Stemmer Algorithm
57 pages
Capacity Building 5
No ratings yet
Capacity Building 5
124 pages
Unit I
No ratings yet
Unit I
12 pages
Natural Language Processing Guide
No ratings yet
Natural Language Processing Guide
21 pages
NLP Unit I
No ratings yet
NLP Unit I
30 pages
Introduction To Natural Language Processing and NLTK
No ratings yet
Introduction To Natural Language Processing and NLTK
23 pages
NLP Notes
No ratings yet
NLP Notes
71 pages
Natural Language Processing
No ratings yet
Natural Language Processing
30 pages
NLP for AI and Tech Enthusiasts
No ratings yet
NLP for AI and Tech Enthusiasts
30 pages
Natural Language Processing (NLP) in AI
No ratings yet
Natural Language Processing (NLP) in AI
7 pages
Natural Language Processing Lec 1
No ratings yet
Natural Language Processing Lec 1
23 pages
Central Bank Officer Application Details
No ratings yet
Central Bank Officer Application Details
4 pages
Natural Language Processing
No ratings yet
Natural Language Processing
14 pages
English Grammar for Learners
No ratings yet
English Grammar for Learners
2 pages
NLP Introduction Week3
No ratings yet
NLP Introduction Week3
28 pages
Psychoeducational Report
No ratings yet
Psychoeducational Report
6 pages
Parts of Speech
No ratings yet
Parts of Speech
63 pages
2 Introduction
No ratings yet
2 Introduction
15 pages
Screenshot 2024-04-27 at 20.58.11
No ratings yet
Screenshot 2024-04-27 at 20.58.11
72 pages
NLP for Computer Science Students
No ratings yet
NLP for Computer Science Students
16 pages
NLP Self
No ratings yet
NLP Self
22 pages
CL Unit 1
No ratings yet
CL Unit 1
11 pages
NLP 833
No ratings yet
NLP 833
26 pages
Ieltsspeakingpart2 181010031744
No ratings yet
Ieltsspeakingpart2 181010031744
20 pages
NLP Ia1
No ratings yet
NLP Ia1
7 pages
Bhawini NLP Practical
No ratings yet
Bhawini NLP Practical
98 pages
Chapter-1 Introduction To NLP
No ratings yet
Chapter-1 Introduction To NLP
12 pages
Seminar Report
No ratings yet
Seminar Report
12 pages
Action Research Proposal Outline
No ratings yet
Action Research Proposal Outline
6 pages
Application For A Scholarship of The Deutsche Bundesstiftung Umwelt DBU
No ratings yet
Application For A Scholarship of The Deutsche Bundesstiftung Umwelt DBU
7 pages
NCERT Solutions Class 11 Computer Science Strings
No ratings yet
NCERT Solutions Class 11 Computer Science Strings
18 pages
Oral Language Skills in Literacy
No ratings yet
Oral Language Skills in Literacy
5 pages
Harmonizing Humanity and Technology
No ratings yet
Harmonizing Humanity and Technology
10 pages
BPTC Drafting Briefing Sheet March 2021
No ratings yet
BPTC Drafting Briefing Sheet March 2021
2 pages
Natural Language Processing
No ratings yet
Natural Language Processing
28 pages
Test
0% (2)
Test
64 pages
Natural Language Processing Applications in Cyber Security
No ratings yet
Natural Language Processing Applications in Cyber Security
3 pages
Modal Primer Upotreba Can: Can / Could / May / Might / Must / Shall / Should / Ought To / Will / Would
No ratings yet
Modal Primer Upotreba Can: Can / Could / May / Might / Must / Shall / Should / Ought To / Will / Would
1 page
Assignment of AI Finished
No ratings yet
Assignment of AI Finished
16 pages
NLP Basics for Beginners
No ratings yet
NLP Basics for Beginners
7 pages
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
No ratings yet
Natural Language Processing: Bachelor of Technology Computer Science and Engineering
7 pages

Computational Linguistics Overview

Uploaded by

Computational Linguistics Overview

Uploaded by

What is computational linguistics (CL)?

Computational linguistics (CL) is the application of computer science to the

Computational linguistics is used in tools such as instant machine

The term computational linguistics is also closely linked to natural language

Applications of computational linguistics

Applications of CL typically include the following:

 Machine translation. This is the process of using AI to

 Application clustering. This is the process of turning multiple

 Sentiment analysis. Sentiment analysis is an important approach

 Information extraction. This is the creation of knowledge from

 Natural language interfaces. These are computer-human

 Content filtering. This process blocks various language-based

 Text mining. Text mining is the process of extracting useful

Approaches and methods of computational

 The corpus-based approach, which is based on the language as

 The comprehension approach, which enables the NLP engine to

 The developmental approach, which adopts the language

 The production approach focuses on a CL model to produce text.

 The text-based interactive approach uses text from a

 The speech-based interactive approach works similarly

NLP plays an important role in creating language technologies, including

1. Morphological Analysis/ Lexical Analysis

Correct Syntax: Sun rises in the east.

Incorrect Syntax: Rise in sun the east.

It can be classified into 2 types:

While we can manually break sentences on these punctuations, python’s NLTK

A Stemmer is very fast in comparison to Lemmatization. Moreover, Lemmatization

As discussed in Stages of Natural Language Processing, Syntax Analysis deals with

For example: Book the flight!

Part Of Speech Tagging

Similarly, “I like to read book” is represented as: [(“I”, “Preposition”), (“like”,

Part Of Speech Tag

Noun (Plural) NNS

Examples of Penn Treebank Tags

Difficulties in POS Tagging

 This (Preposition) is a car

Consider the words: A Book

POS Tag: [(“A”, “Article”), (“Book”, “Noun”)]

o Word Frequency: In this approach, we find the tag that is most

You might also like