Module 9
Module 9
Language Processing: Humans and Computers
Objectives
After reading this module, you will able to do these:
1. To discuss the process of language in the human brain and computers.
2. To compare Psycholinguistics to Computational Linguistics.
3. To use different applications of Computational Linguistics.
4. To value the importance of having knowledge about Psycholinguistics and
Computational Linguistics as future teachers.
Concepts
1
Module 9
Essential Questions
1. What is the difference between the process of language in the human
brain from the process of language in computers?
2. What are the different applications of Computational Linguistics?
3. Why do we need to learn Psycholinguistics and Computational Linguistics
Introduction
2
Module 9
Lesson Proper
THE HUMAN BRAIN AT WORK: HUMAN LANGUAGE PROCESSING
PSYCHOLINGUISTICS is the branch of linguistics that is concerned
with the linguistic performance, language acquisition, and speech production
and comprehension. The human brain is able not only to acquire and store
the mental lexicon and grammar, but also to access the linguistic storehouse
to speak and understand language in real time.
How we process knowledge depends largely on the nature of that
knowledge. For instance, language were not open-ended, and were merely a
finite store of fixed phrases and sentences in memory, then speaking might
simply consist of finding a sentences that expresses a thought wished to
convey. Comprehension could be the reverse- matching the sounds to a stored
string that has been memorized with its meaning but this is not the case. We
do not learn language by imitating and storing sentences, but by constructing a
grammar. When we speak, we access our lexicon to find the words, and we
use the rules of grammar to construct novel sentences and to produce the
sounds that expresses the message we wish to convey. When we listen to
3
Module 9
speech and understand what is being said, we also access the lexicon and
grammar to assign a structure and meaning to the sound we hear.
Speaking and comprehending speech can be viewed as a speech chain, a kind
of "brain-to-brain" linking, as shown in this figure:
The speech chain. A spoken utterance starts a message in the speaker's mind.
It is put into linguistic form and interpreted as articulation commands, emerging
as an acoustic signal. The signal is processed by the listener's ear and sent to
the brain/mind, where it is interpreted.
The grammar relates sounds and meanings, and contains the units and
rule of the language that make speech production and comprehension possible.
However, other psychological processes are used to produce and understand
utterances. Certain mechanisms enable us to break the words in order to
comprehend, and to compose sounds into words in order to produce meaningful
4
Module 9
speech. Other mechanisms determine how we pull words from the mental
lexicon, and still others explain how we construct a phrase structure
representation of the words we retrieve.
COMPREHENSION
One of the aims of psycholinguistics is to describe the processes people
normally use in speaking and understanding language. The various breakdowns
in performance, such as tip of the tongue phenomena, speech errors, and failure
to comprehend tricky sentences, can tell us a great deal about how the language
processor works, just as children’s acquisition errors tell us a lot about the
mechanisms involved in language development.
THE SPEECH SIGNAL
Understanding a sentence involves analysis at many levels. To begin with,
we must comprehend the individual speech sounds we hear. We are not
conscious of the complicated processes we use to understand speech any more
than we are conscious of the complicated processes of digesting food and
utilizing nutrients. We must study these processes deliberately and scientifically.
One of the first questions of linguistic performance concerns segmentation of the
acoustic signal. To understand this process, some knowledge of the signal can
be helpful.
All of the articulatory characteristics are reflected in the physical
characteristics of the sounds produced. Speech sounds can also be described in
physical, or acoustic, terms. Acoustic phonetics is concerned only with speech
5
Module 9
sounds, all of which can be heard by the normal human ear. An important tool in
acoustic research is a computer program that decomposes the speech signal into
its frequency components. When speech is fed to a computer (from a
microphone or a recording), an image of the speech is displayed. The patterns
produce are called spectrograms or, more vividly, voiceprints. A spectrogram of
words ball, bar, bough, and buy is shown in this figure:
By studying spectrograms of all speech sounds and many different utterances,
acoustic phoneticians have learned a great deal about the basic acoustic
components that reflect the articulatory features of speech sounds.
COMPUTER PROCESSING OF HUMAN LANGUAGE
COMPUTATIONAL LINGUISTICS is the branch of linguistics in which the
techniques of computer science are applied to the analysis and synthesis of
language and speech. It is an interdisciplinary field concerned with the statistical
or rule-based modeling of natural language from a computational perspective.
6
Module 9
Traditionally, computational linguistics was
performed by computer scientists who had
specialized in the application of computers to
the processing of a natural language. But
little if no success was made. Computational
linguists often work as members of interdisciplinary teams, which can include
regular linguists, experts in the target language, and computer scientists. In
general, computational linguistics draws upon the involvement
of linguists, computer scientists, experts in artificial intelligence, mathematicians,
Quantum Physicists, logicians, philosophers, cognitive scientists, cognitive
psychologists, psycholinguists,anthropologists and neuroscientists, among
others.
Computational linguistics has theoretical and applied components.
Theoretical computational linguistics focuses on issues in theoretical
linguistics and cognitive science, and applied computational linguistics focuses
on the practical outcome of modeling human language use.
The Association for Computational Linguistics defines computational linguistics
as:
...the scientific study of language from a computational perspective.
Computational linguists are interested in providing computational
models of various kinds of linguistic phenomena.
7
Module 9
Computational Phonetics and Phonology are two fields in computational
phonetics concerned with processing speech. Its main goals are converting
speech to text on the comprehension side, and text to speech on the production
side.
Computational phonetics and phonology have two sides:
SPEECH RECOGNITION (SR) is the inter-disciplinary sub-field
of computational linguistics which incorporates knowledge and research in
the linguistics, computer science, and electrical engineering fields to
develop methodologies and technologies that enables the recognition
and translation of spoken language into text by computers and
computerized devices such as those categorized as smart technologies
and robotics. It is also known as "automatic speech recognition" (ASR),
"computer speech recognition", or just "speech to text" (STT).
Some SR systems use "training" (also called "enrollment") where an individual
speaker reads text or isolated vocabulary into the system. The system analyzes
the person's specific voice and uses it to fine-tune the recognition of that person's
speech, resulting in increased accuracy. Systems that do not use training are
called "speaker independent" systems. Systems that use training are called
"speaker dependent".
Speech recognition applications include voice user interfaces such as voice
dialing (e.g. "Call home"), call routing (e.g. "I would like to make a collect
call"), domotic appliance control, search (e.g. find a podcast where particular
8
Module 9
words were spoken), simple data entry (e.g., entering a credit card number),
preparation of structured documents (e.g. a radiology report), speech-to-text
processing (e.g., word processors or emails), and aircraft (usually termed Direct
Voice Input).
The term voice recognition or speaker identification refers to identifying the
speaker, rather than what they are saying. Recognizing the speaker can simplify
the task of translating speech in systems that have been trained on a specific
person's voice or it can be used to authenticate or verify the identity of a speaker
as part of a security process.
From the technology perspective, speech recognition has a long history with
several waves of major innovations. Most recently, the field has benefited from
advances in deep learning and big data. The advances are evidenced not only by
the surge of academic papers published in the field, but more importantly by the
world-wide industry adoption of a variety of deep learning methods in designing
and deploying speech recognition systems. These speech industry players
include Google, Microsoft, IBM, Baidu (China), Apple, Amazon, Nuance, IflyTek (
China), many of which have publicized the core technology in their speech
recognition systems as being based on deep learning.
Example: For language learning, speech recognition can be useful for learning
a second language. It can teach proper pronunciation, in addition to helping a
person develop fluency with their speaking skills.
9
Module 9
SPEECH SYNTHESIS is the artificial production of human speech. A
computer system used for this purpose is called a speech
computer or speech synthesizer, and can be implemented
in software or hardware products. A text-to-speech (TTS) system converts
normal language text into speech; other systems render symbolic linguistic
representations like phonetic transcriptions into speech.
Synthesized speech can be created by concatenating pieces of recorded speech
that are stored in a database. Systems differ in the size of the stored speech
units; a system that stores phones or diphones provides the largest output range,
but may lack clarity. For specific usage domains, the storage of entire words or
sentences allows for high-quality output. Alternatively, a synthesizer can
incorporate a model of the vocal tract and other human voice characteristics to
create a completely "synthetic" voice output.
A text-to-speech system (or "engine") is composed of two parts: a front-end and
a back-end. The front-end has two major tasks. First, it converts raw text
10
Module 9
containing symbols like numbers and abbreviations into the equivalent of written-
out words. This process is often called text normalization, pre-processing,
or tokenization. The front-end then assigns phonetic transcriptions to each word,
and divides and marks the text into prosodic units, like phrases, clauses,
and sentences. The process of assigning phonetic
transcriptions to words is called text-to-phoneme or grapheme-to-phoneme
conversion.
COMPUTATIONAL MORPHOLOGY is the processing of word structures by
computers. Computational morphology deals with the processing of words and
word forms, in both their graphemic, i.e., written form, and their phonemic, i.e.,
spoken form. It has a wide range of practical applications.
Example: To process words, the computer must be programmed to look for roots
and affixes. In some cases, this process is straightforward.
Book is to be broke down into book + s, where book is the root word and –s is the
affix.
11
Module 9
In some cases, however, there are words that are more difficult to break down,
such as:
Profundity = profound + ity
Galactic = galaxy + ic
Democracy = democrat + y
COMPUTATIONAL SEMANTICS is the representation of meanings and
morphemes in the computer, as well the meanings derived from their
combination
Computational semantics has 2 chief concern;
1. to produce semantic representation in the computer language input
2. to take semantic representation and produce natural output that conveys
the meaning
Example:
There are 2 well known language processing system that used predicate
argument approach to semantic representation
1. SHRDLU developed Terry Winogard
SHRDLU can demonstrate several abilities such as interpret questions,
draw inferences, learn new words, and even explained its own action.It
operated in the context of a blocks world,” consisting of a table, block of
various shapes, sizes, and colors and a robot arm for moving the blocks.
12
Module 9
Using simple sentences, one could ask question about the blocks and give
command s to have the blocks moved to one location to another.
2. LUNAR developed by William Wood
LUNAR is capable of answering question phrased in simple English about
lunar rock sample brought back from the moon by the astronauts. LUNAR
can translate English questions into logical representation, which it then
used to query a database of information about the lunar sample
COMPUTATIONAL PRAGMANTICS may influence the understanding or
response of the computer by taking into account knowledge that the computer
system has about the real world.
When a sentence is structurally ambiguous, the parser will compute the
structures and the semantics processing will eliminate some of the structures if
they are anomalous. Pragmatic knowledge is needed in to determine the
intended meaning.
13
Module 9
COMPUTATIONAL SIGN LANGUAGE
Research linguist are working with a computer algorithm that will recognize sign
language much in the same that speech can be recognized. Signers will sign in
front of the camera and the computer match the particular sign from the pre
stored sign via visual processing.
The purpose of this enterprise is 2fold:
1. to produce a video dictionary of signs
- someone can imitate sign but doesn’t know the meaning can look it up
in the video dictionary just like as one uses an ordinary dictionary for
written text.
2. To enable a computer to search through ASL videos a particular sign
- just like google search engine that searches for certain keywords)
14
Module 9
APPLICATION OF COMPUTATIONAL LINGUISTIC
COMPUTER MODELS OF GRAMMAR
Computers maybe programed to model a grammar of human language
and thus rapidly and thoroughly test the that grammar. Because linguistic
competence is so complex, computers are being used as a tool to understand
human language and its use.
Modern computer architecture includes parallel processing machines that
can be programmed to process language more as humans do, in so far carrying
out many linguistic tasks simultaneously.
FREQUENCY ANALYSIS OF CONCORDANCE AND COLLOCATION
Corpus is the body of language to analyze this a computer will perform a
frequency analysis of words, compute a concordance, which locate the words in
the corpus and give their immediate context and collocation, which measures
how the occurrence of one word affects the probability of the occurrence of other
words.
COMPUTATIONAL LEXICOGRAPHY
The use of computers both to construct “ordinary dictionaries” and also to
construct electronic dictionaries with far more information, suitable for the goal of
languages and generation.
15
Module 9
Examples:
Wordnet is an online dictionary developed by
Princeton University with 10,000 words entries
that attempt to satisfy some of need of
computational linguist, emphasize in semantic
relationship.
Meriam's Webster Dictionary
INFORMATION RETRIEVAL AND SUMMARIZATION
Information Retrieval is the process when we people use the search features of
the internet to find information. Typically, one enters a keyword, or perhaps
several, and magically the computer returns location of websites that contain
information relation to that keyword.
Through summarization programs, computers can eliminate redundancy
and identify the most salient features of a body of information. This range from
the simplistic “print the first sentence of the paragraph” using often “concept
vector” that analyze the semantically to identify important points.
Concept Vector is a list of meaningful keywords whose presence in a
paragraph’s significance.
16
Module 9
Example:
! Words with yellow highlight are the summarization example
SPELL CHECKERS
17
Module 9
A computer program that flags you and checks the spelling of words in
files of text, typically by comparison with a stored list of words.
Disadvantage
They may not recognize proper place names
For example: Calaburnay
They won't recognize many technical terms or abbreviations, although if
you use these regularly, you can add them to the custom dictionary.
If you have used the correct word but the incorrect spelling, they will not
find this mistake for example, "here" instead of "hear", or "their" instead of
"there".
Teachers are also concerned that students are not learning to spell things
correctly for themselves and rely too much on the spell checker to correct their
mistakes.
MACHINE TRANSLATION
In 1940’s they attempt to
develop automatic language machine
translation especially for the World
War II in attempt to decipher coded
enemy communication.
The aim in automatic translation is to
input a spoken utterance or written
passage in the source language and
receive grammatical passage of
18
Module 9
equivalent meaning in the target language.
This a difficult and complex task that result often humorous because
in the early days they thought entering into the memory of a computer a
dictionary of morphemes of the source language corresponding to the target
languages so they called the machine translator “language in, garbage out”.
The translation is more than word for word replacement. Often there is no
equivalent word in the target languages, and the order of the word may differ
written passage in the source language and to receive grammatical word.
19
Module 9
Example of humorous translation
1. Kentucky Fried Chicken opened their first store
in China using their common slogan, “finger
lickin’ good”. Unfortunately, it was translated
into,” eat your fingers off”
2. Coors in Mexico, the Brewing Company,
mistranslated, “Turn it loose” in Spanish to Suffer
from diarrhea”.”
3. President Carter traveled to Poland in 1977. The State Department
found a Russian interpreter who knew Polish, but was not a
professional in that language. Carter’s interpreter translated him saying
things in Polish like "when I abandoned the United States" (for "when I
left the United States") and "your lusts for the future" (for "your desires
for the future"). The media in both countries enjoyed the mistakes.
4. When Pepsi launched a new campaign with the slogan
“Pepsi brings you back to life,” it translated into
Mandarin meaning “Pepsi brings your ancestors back
from the grave.”
20
Module 9
Other Application of Computational Linguistics are found at forensic field
where computational forensic linguist takes up legal problems as trademark
protection infringement, in which computer are used to examine huge corpuses
to infer how people interpret trademarks such as Mc-in McDonald’s and speaker
identification, where computational analysis of speech in a crime such as bomb
threat can assist in identifying or exonerating, a suspect.
21
Module 9
Activity
1. What is the branch of linguistics that is concerned with the linguistic
performance, language acquisition, and speech production and comprehension?
A. Psycholinguistics C. Computational linguistics
B. Acquisition Linguistics D. Morphology Linguistics
2. Speaking and comprehending speech can be viewed as
A. Psycholinguistics C. Comprehension process
B. Speech Chain D. brain to brain linking
3. The aim in automatic translation is to input a spoken utterance or written
passage in the source language and receive grammatical passage of equivalent
meaning in the target language is
A. True C. Wrong
B. False D. Maybe
4. Training is also called
A. enrollment C. trial
B. practice D. process
5. The translation is more than word for word replacement
22
Module 9
A. True C. Wrong
B. False D. Maybe
6. What is the branch of linguistics in which the techniques of computer science
are applied to the analysis and synthesis of language and speech?
A. Psycholinguistics C. Computational linguistics
B. Acquisition Linguistics D. Morphology Linguistics
7. What is the representation of meanings and morphemes in the computer?
A. Computational Semantics C. Computational Phonology
B. Computational Pragmatics D. Computational Morphology
8. Speech Chain is
A. A spoken utterance starts a message in the speaker's mind.
B. a kind of "brain-to-brain" linking
C. It is put into linguistic form and interpreted as articulation commands,
emerging as an acoustic signal.
D. all of the above
9. A photographic or other visual or electronic representation of a spectrum.
A. Spectographic C. Spectrographics
B. Spectrogram D. Spectrograph
10. What is the action or capability of understanding something?
A. Comprehension C. Reading
B. Speech signals D. Psycholiguistics
11. It is only concerned with speech sounds that can be heard by the normal
human ear.
A. Acoustic Phonetics C. Auditory Phonetics
23
Module 9
B. Articulatory Phonetics D. Orthography
12. Spectogram is vividly known as
A. voice prints C. speech print
B. print voice D. print speech
13. SHRDLU can demonstrate several abilities except for
A. interpret questions C. can moved blocks to one location to another
B. draw conclusions D. learn new words
14. It is the processing of word structures by computers. Computational
morphology deals with the processing of words and word forms, in both their
graphemic, i.e., written form, and their phonemic, i.e., spoken form.
A. Computational Semantics C. Computational Phonology
B. Computational Pragmatics D. Computational Morphology
15. Artificial production of human speech.
A. Speech Synthesis C. Artificial Speech
B. Speech Recognition D. Speech Production
16. The body of language is
A. Concordance C. Collocation
B. Corpus D. Frequency Analysis
17. A computer program that flags you and checks the spelling of words in files of
text, typically by comparison with a stored list of words.
24
Module 9
A. Spell Checkers C. Ginger
B. Grammar Checker D. Flag Checkers
18. Influence the understanding or response of the computer by taking into
account knowledge that the computer system has about the real world.
A. Computational Semantics C. Computational Phonology
B. Computational Pragmatics D. Computational Morphology
19. The machine translator is also called
A. language in, garbage out C. garbage in, language out
B. garbage out, language in D. language out, garbage in
20. A program that process language more as humans do, in so far carrying out
many linguistic tasks simultaneously.
A. Parallel Processing C. Processing Machine
B. Computer Model of Grammar D. Modern Computer Architecture
21. A text-to-speech system is also known as
A. Engine C. Synthesizer
B. Processor D. Connector
22. A list of meaningful keywords whose presence in a paragraph’s significance.
A. Concept Vector C. Collocation
B. Summarization D. Interval
23. The process when we people use the search features of the internet to find
information.
A. Information Retrieval C. Concept Vector
B. Summarization D. Searching
25
Module 9
24. Computer programmed can eliminate redundancy and identify the most
salient features of a body of information.
A. Summarization C.
B. Concept Vectors D. Information Retrieval
25. A text-to-speech system is composed of two parts
A. front end and back end C. front end and back opening
B. front opening and back opening D. front opening and back end
References
BOOK SOURCE:
Introduction to Linguistics. Fromkin, Victoria. et al. .Pasig: Cengage Learning
Asia Pte. Ltd., 2010
ELECTRONIC SOURCES:
Language Processing by You, and by Computer. Retrieved October 5, 2016 from
http://www.ccunix.ccu.edu.tw/~lngmyers/Lx_Process.txt
Language Processing: Humans and Computers
https://quizlet.com/79661709/english-linguistics-ch-9-language-processing-
humans-and-computers-final-exam-flash-cards/
The Speech Chain
https://www.google.com.ph/search?
q=brain+to+brain+linking&espv=2&biw=980&bih=703&source=lnms&tbm=isch&s
a=X&ved=0ahUKEwijs_61iMXPAhWSQpQKHUiWCWIQ_AUIBigB#tbm=isch&q=
the+speech+chian&imgrc=ThH6PoewEg4gkM%3A
26