0% found this document useful (0 votes)

63 views14 pages

Morphological Processing of Semitic Languages

Uploaded by

abdi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

63 views14 pages

Morphological Processing of Semitic Languages

Uploaded by

abdi

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 14

Adama Science and Technology University

Department of Computer Science and Engineering

Course Title: Natural Language Processing

Individual Assignment Two (Book Chapter)

Topic: Morphological Processing of Semitic Languages

Submitted to Dr. Beharu

Submitted By: Abdi Mosisa

ID: pgr/28201/15

Morphological Processing of Semitic Languages

2.1 Introduction

In NLP, morphology is the study of how words are built up from smaller meaningful units called
morphemes [1]. These morphemes are the building blocks of words, and understanding them
helps computers process language more effectively.

Here's a breakdown of key concepts in NLP morphology:

 Morphemes: The smallest units of meaning in a language. They can't be broken down
further into smaller meaningful parts. There are two main types:
o Stems: The core meaning-carrying unit of a word. For example, "ሰበረ" in "ሰበረች" is
the stem.
o Affixes: Bound morphemes that attach to stems to modify their meaning or
grammatical function. Examples include prefixes (አል- in አልሰበረም), suffixes (-ዉ in

ሰበረዉ), and infixes (-በ- in ሰባበረ).

 Morphological Analysis: The process of breaking down a word into its constituent
morphemes. This helps NLP tasks like:
o Lemmatization: Reducing words to their base form (lemma) for better understanding.
For instance, "playing" and "played" would both be reduced to "play".
o Part-of-Speech Tagging: Identifying the grammatical function of a word (noun, verb,
adjective etc.) based on its morphemes. For example, the "-ed" suffix often indicates
past tense for verbs.
 Morphological Generation: The process of building words by combining stems and affixes.
This can be useful for tasks like:
o Machine translation: Understanding how morphemes are combined in one language
to create their equivalent in another.

The process of morphological analysis involves identifying and categorizing these morphemes
within words to understand how they contribute to meaning and grammatical structure. This
analysis can reveal insights into word formation, inflectional patterns, and derivational processes
within a language.

2
This chapter addresses morphological processing of Semitic languages. In light of the complex
morphology and problematic orthography of many of the Semitic languages, the chapter begins
with a recapitulation of the challenges these phenomena pose on computational applications. It
then discusses the approaches that were suggested to cope with these challenges in the past. The
bulk of the chapter, then, discusses available solutions for morphological processing, including
analysis, generation, and disambiguation, in a variety of Semitic languages. The concluding
section discusses future research directions.

Semitic languages are characterized by complex, productive morphology, with a basic word-
formation mechanism, root-and-pattern, that is unique to languages of this family. Morphological
processing of Semitic languages therefore necessitates technology that can successfully cope
with these complexities. Several linguistic theories, and, consequently, computational linguistic
approaches, are often developed with a narrow set of (mostly European) languages in mind. The
adequacy of such approaches to other families of languages is sometimes sub-optimal. A related
issue is the long tradition of scholarly work on some Semitic languages, notably Arabic [2] and
Amharic [1], which cannot always be easily consolidated with contemporary approaches.

Inconsistencies between modern, English-centric approaches and traditional ones are easily
observed in matters of lexicography. In order to annotate corpora or produce tree-banks, an
agreed-upon set of part-of-speech (POS) categories is required. Since early approaches to POS
tagging were limited to English, resources for other languages tend to use “tag sets”, or
inventories of categories, that are minor modifications of the standard English set. Such an
adaptation is problematic for Semitic languages.

These issues are complicated further when morphology is considered. The rich, non-
concatenative morphology of Semitic languages frequently requires innovative solutions that
standard approaches do not always provide.

2.2 Basic Notions

The word ‘word’ is one of the most loaded and ambiguous notions in linguistic theory [3]. Since
most computational applications deal with written texts (as opposed to spoken language), the

3
most useful notion is that of an orthographic word. This is a string of characters, from a well-
defined alphabet of letters, delimited by spaces, or other delimiters, such as punctuation. A text
typically consists of sequences of orthographic words, delimited by spaces or punctuation;
orthographic words in a text are often referred to as tokens.

Orthographic words are frequently not atomic: they can be further divided to smaller units, called
morphemes. Morphemes are the smallest meaning-bearing linguistic elements; they are
elementary pairings of form and meaning. Morphemes can be either free, meaning that they can
occur in isolation, as a single orthographic word; or bound, in which case they must combine
with other morphemes in order to yield a word. For example, the word two consists of a single
(free) morpheme, whereas dogs consists of two morphemes: the free morpheme dog, combined
with the bound morpheme -s. The latter form indicates the fact that it must combine with other
morphemes (hence the preceding dash); its function is, of course, denoting plurality. When a
word consists of some free morpheme, potentially with combined bound morphemes, the free
morpheme is called a stem, or sometimes root.

Bound morphemes are typically affixes. Affixes come in many varieties: prefixes attach to a

stem before the stem (e.g., የ- in የአበበ), suffixes attach after the stem (-ነት in ሰዉነት), infixes
occur inside a stem .Morphological processes define the shape of words. They are usually
classified to two types of processes. Derivational morphology deals with word formation; such
processes can create new words from existing ones, potentially changing the cate- gory of the

original word. For example, the processes that create በታማኝነት from ታማኝነት, and ታማኝነት

from ታማኝ, are derivational. Such processes are typically not highly productive; for example,

one cannot derive አፍቃሪ from ፍቅር.

In contrast, inflectional morphology yields inflected forms, variants of some base, or citation
form, of words; these forms are constructed to adhere to some syntactic constraints, but they do
not change the basic meaning of the base form. Inflectional processes are usually highly
productive, applying to most members of a particular word class. For example, English nouns
inflect for number, so most nouns occur in two forms, the singular (which is considered the
citation form) and the plural, regularly obtained by adding the suffix -s to the base form.

4
Word formation in Semitic languages is based on a unique mechanism, known as root-and-
pattern. Words in this language family are often created by the combination of two bound
morphemes, a root and a pattern. The root is a sequence of consonants only, typically three; and
the pattern is a sequence of vowels and consonants with open slots in it. The root combines with
the pattern through a process called interdigitation: each letter of the root (radical) fills a slot in
the pattern.

In addition to the unique root-and-pattern morphology, Semitic languages are characterized by a

productive system of more standard affixation processes. These include prefixes, suffixes, infixes
and circumfixes, which are involved in both inflectional and derivational processes.

For example, Amharic is one of the highly inflectional Semitic language[1]. Nouns as well as
adjectives are inflected for number, gender, genitive, accusative, diminutive, dative,
instrumental, conjunctive, possessive and determiner.

Example:-

• አዲስ ቤት - new house

• አዲስ ቤትዎቸ - new houses

• ብልጥ - smart (male)

• ብልጥት -smart (female)

2.3 The Challenges of Morphological Processing

Morphological processing is a crucial component of many natural language processing (NLP)

applications. Whether the goal is information retrieval, question answering, text summarization
or machine translation, NLP systems must be aware of word structure. For some languages and
for some applications, simply stipulating a list of surface forms is a viable option; this is not the
case for languages with complex morphology, in particular Semitic languages, both because of
the huge number of potential forms and because of the difficulty of such an approach to handle

5
out-of-lexicon items (in particular, proper names), which may combine with prefix or suffix
particles.

An alternative solution would be a dedicated morphological analyzer, implementing the

morphological and orthographic rules of the language. Ideally, a morphological analyzer for any
language should reflect the rules underlying derivational and inflectional processes in that
language. Of course, the more complex the rules, the harder it is to construct such an analyzer.
The main challenge of morphological analysis of Semitic languages stems from the need to
faithfully implement a complex set of interacting rules, some of which are non-concatenative.
Once such a grammar is available, it typically produces more than one analysis for any given
surface form; in other words, Semitic languages exhibit a high degree of morphological
ambiguity, which has to be resolved in a typical computational application. The level of
morphological ambiguity is higher in many Semitic languages than it is in English, due to the
rich morphology and deficient orthography. This calls for sophisticated methods for
disambiguation. While in English (and other European languages) morphological disambiguation
may amount to POS tagging, Semitic languages require more effort, since determining the
correct POS of a given token is intertwined with the problem of segmenting the token to
morphemes, the set of morphological features (and their values) is larger, and consequently the
number of classes is too large for standard classification techniques. Several models were
proposed to address these issues.

Contemporary approaches to part-of-speech tagging are all based on machine learning: a large
corpus of text is manually annotated with the correct POS for each word; then, a classifier is
trained on the annotated corpus, resulting in a system that can predict POS tags for unseen texts
with high accuracy. The state of the art in POS tagging for English is extremely good, with
accuracies that are indistinguishable from human level performance. Various classifiers were
built for this task, implementing a variety of classification techniques, such as Hidden Markov
Models (HMM) [4], Average Perceptron [5], Maximum Entropy [6], Support Vector Machines
(SVM) [7], and others.

For languages with complex morphology, and Semitic languages in particular, however, these
standard techniques do not perform as well, for several reasons:

6
1. Due to issues of orthography, a single token in several Semitic languages can actually be a sequence
of more than one lexical item, and hence be associated with a sequence of tags.
2. The rich morphology implies a much larger tagset, since tags reflect the wealth of morphological
information which words exhibit. The richer tagset immediately implies problems of data
sparseness, since most of the tags occur only rarely, if at all, in a given corpus.
3. As a result of both orthographic deficiencies and morphological wealth, word forms in Semitic
languages tend to be ambiguous.
4. Word order in Semitic is relatively free, and in any case freer than in English.

2.4 Computational Approaches to Morphology

No single method exists that provides an adequate solution for the challenges involved in
morphological processing of Semitic languages. The most common approach to morphological
processing of natural language is finite-state technology[8]. The adequacy of this technology for
Semitic languages has frequently been challenged, but clearly, with some sophisticated
developments, such as flag diacritics [9], multi-tape automata [10] or registered automata [11],
finite-state technology has been effectively used for describing the morphological structure of
several Semitic languages [12].

2.4.1 Two-Level Morphology

Two-level morphology was “the first general model in the history of computational linguistics
for the analysis and generation of morphologically complex languages” [13]. Developed by
Koskenniemi [14], this technology facilitates the specification of rules that relate pairs of surface
strings through systematic rules. Such rules, however, do not specify how one string is to be
derived from another; rather, they specify mutual constraints on those strings. Furthermore, rules
do not apply sequentially. Instead, a set of rules, each of which constrains a particular string pair
correspondence, is applied in parallel, such that all the constraints must hold simultaneously. In
practice, one of the strings in a pair would be a surface realization, while the other would be an
underlying form.

2.4.2 Registered Automata

7
Finite-state registered automata [11] were developed with the goal of facilitating the expression
of various non-concatenative morphological phenomena in an efficient way. The main idea is to
augment standard finite-state automata with (finite) amount of memory, in the form of registers
associated with the automaton transitions. This is done in a restricted way that saves space but
does not add expressivity. The number of registers is finite, usually small, and eliminates the
need to duplicate paths as it enables the automaton to ‘remember’ a finite number of symbols.
Technically, each arc in the automaton is associated (in addition to an alphabet symbol) with an
action on the registers. Cohen-Sygal and Wintner [11] define two kinds of actions, read and
write. The former allows an arc to be traversed only if a designated register contains a specific
symbol. The latter writes a specific symbol into a designated register when an arc is traversed.

Cohen-Sygal and Wintner [11] show that finite-state registered automata can efficiently model
several non-concatenative morphological phenomena, including circumfixation, root and pattern
word formation in Semitic languages, vowel harmony, limited reduplication etc. The
representation is highly efficient: for example, to account for all the possible combinations of r
roots and p patterns, an ordinary FSA requires O(r × p) arcs whereas a registered automaton
requires only O(r C p) arcs. Unfortunately, no implementation of the model exists as part of an
available finite-state toolkit.

2.4.3 Analysis by Generation

Most of the approaches discussed above allow for a declarative specification of (morphological)
grammar rules, from which both analyzers and generators can be created automatically. A
simpler, less generic yet highly efficient approach to the morphology of Semitic languages had
been popular with actual applications. In this framework, which we call analysis by generation
here, the morphological rules that describe word formation and/or affixation are specified in a
way that supports generation, but not necessarily analysis. Coupled with a lexicon of morphemes
(typically, base forms and concatenative affixes), such rules can be applied in one direction to
generate all the surface forms of the language. This can be done off-line, and the generated forms
can then be stored in a database; analysis, in this paradigm, amounts more or less to simple table

8
lookup.

2.4.4 Functional Morphology

Functional morphology [16] is a computational framework for defining language resources, in

particular lexicons. It is a language-independent tool, based on a word- and-paradigm model,
which allows the grammar writer to specify the inflectional paradigms of words in a specific
language in a similar way to printed paradigm tables. A lexicon in functional morphology
consists of a list of words, each associated with its paradigm name, and an inflection engine that
can apply the inflectional rules of the language to the words of the lexicon.

This framework was used to define morphological grammars for several languages, including
modeling of non-concatenative processes such as vowel harmony, reduplication, and templatic
morphology. In particular, uses this paradigm to implement a morphological processor of Arabic.

2.4.5 Morphological Analysis and Generation of Semitic Languages

2.4.5.1 Amharic

Computational work on Amharic began only recently. Fissaha and Haller [17] describe a
preliminary lexicon of verbs, and discuss the difficulties involved in implementing verbal
morphology with XFST. XFST is also the framework of choice for the development of an
Amharic morphological grammar [12]; but evaluation on a small set of 1,620 words reveal that
while the coverage of the grammar on this corpus is rather high (85–94 %, depending on the part
of speech), its precision is low and many word forms (especially verbs) are associated with
wrong analyses.

Argaw and Asker [1] describe a stemmer for Amharic. Using a large dictionary, the stemmer
first tries to segment surface forms to sequences of prefixes, stem, and affixes. The candidate
stems are then looked up in the dictionary, and the longest found stem is chosen (ties are

9
resolved by the frequency of the stem in a corpus). Evaluation on a small corpus of 1,500 words
shows accuracy of close to 77 %.

The state of the art in Amharic, however, is most probably HornMorpho: it is a system for
morphological processing of Amharic, as well as Tigrinya (another Ethiopian Semitic language)
and Oromo (which is not Semitic). The system is based on finite-state technology, but the basic
transducers are augmented by feature structures, implementing ideas introduced by Amtrup.
Manual evaluation on 200 Tigrinya verbs and 400 Amharic nouns and verbs shows very accurate
results: in over 96 % of the words, the system produced all and only the correct analyses.

2.4.6 Related Applications

Also worth mentioning here are a few works that address other morphology-related tasks. These
include a shallow morphological analyzer for Arabic [10] that basically segments word forms to
sequences of (at most one) prefix, a stem and (at most one) suffix; a system for identifying the
roots of Hebrew and Arabic (possibly inflected) words; various programs for vocalization, or
restoring diacritics, in Arabic and in other Semitic languages; determining case endings of
Arabic words; and correction of optical character recognizer (OCR) errors.

When downstream applications are considered, such as chunking, parsing, or machine

translation, the question of tokenization gains much importance. Morpho- logical analysis
determines the lexeme and the inflections (and, sometimes, also the derivational) morphemes of
a surface form; but the way in which a surface form is broken down to its morphemes for the
purpose of further processing can have a significant impact on the accuracy of such applications.

Several works investigate various pre-processing techniques for Arabic, in the context of
developing Arabic-to-English statistical machine translation systems [17,51].

2.5 Morphological Disambiguation of Semitic Languages

10
Early attempts at POS tagging and morphological disambiguation of Semitic languages relied on
more “traditional” approaches, borrowed directly from the general (i.e., English) POS tagging
literature. The first such work is probably [10], who defined a set of 131 POS tags, manually
annotated a corpus of 50,000 words and then implemented a tagger that combines statistical and
rule-based techniques that performs both segmentation and tag disambiguation. Similarly, [17]
use SVM to automatically tokenize, POS-tag, and chunk Arabic texts. To this end, they use a
reduced tag set of only 24 tags, with which the reported results are very high. The set of tags is
extended to 75 in [11].

As for Amharic, [1] uses condition random fields for POS tagging. As the annotated corpus used
for training is extremely small (1,000 words), it is not surprising the accuracy is rather low: 84 %
for segmentation, and only 74 % for POS tagging. Two other works use a recently-created
210,000-word annotated corpus to train Amharic POS taggers with a tag set of size 30. Gambäck
et al. experiment with HMM, SVM and Maximum Entropy; accuracy ranges between 88 and 95
%, depending on the test corpus. Similarly, investigate various classification techniques, using
the same corpus for the same task. The best accuracy, achieved with SVM, is over 86 %, but
other classification methods, including conditional random fields and memory-based learning,
perform well.

11
2.6 Summary

The discussion above establishes the inherent difficulty of morphological processing with
Semitic languages, as one instance of languages with rich and complex morphology. Having said
that, it is clear that with a focused effort, contemporary computational technology is sufficient
for tackling the difficulties. Sect. 2.5 shows that morphological disambiguation of these two
languages can be done with high accuracy, nearing the accuracy of disambiguation with
European languages.

However, for the less-studied languages, including Amharic, Maltese and others, much work is
still needed in order to produce tools of similar precision. Resembling the situation in Arabic and
Hebrew, this effort should focus on two fronts: development of formal, computationally-
implementable sets of rules that describe the morphology of the language in question; and
collection and annotation of corpora from which morphological disambiguation modules can be
trained.

As for future technological improvements, we note that “pipeline” approaches, whereby the input
text is fed, in sequence, to a tokenizer, a morphological analyzer, a morphological
disambiguation module and then a parser, have probably reached a ceiling, and the stage is ripe
for more elaborate, unified approaches.

12
References

1. Argaw AA, Asker L (2007) An Amharic stemmer: reducing words to their citation forms. In:
Proceedings of the ACL-2007 workshop on computational approaches to Semitic languages,
Prague
2. Owens J (1997) The Arabic grammatical tradition. In: Hetzron R (ed) The Semitic languages.
Routledge, London/New York, chap 3, pp 46–58
3. Harley HB (2006) English words: a linguistic introduction. The language library. Wiley-
Blackwell, Malden
4. Brants T (2000) TnT: a statistical part-of-speech tagger. In: Proceedings of the sixth conference on
applied natural language processing, Seattle. Association for Computational Linguistics, pp 224–
231. doi:10.3115/974147.974178, http://www.aclweb.org/anthology/ A00-1031
5. Collins M (2002) Discriminative training methods for hidden markov models: theory and
experiments with perceptron algorithms. In: Proceedings of the ACL-02 conference on empirical
methods in natural language processing, EMNLP ’02, Philadelphia, Vol 10. Association for
Computational Linguistics, pp 1–8. doi:http://dx.doi.org/10.3115/1118693. 1118694
6. Ratnaparkhi A (1996) A maximum entropy model for part-of-speech tagging. In: Brill E, Church K
(eds) Proceedings of the conference on empirical methods in natural language processing,
Copenhagen. Association for Computational Linguistics, pp 133–142
7. Giménez J, Màrquez L (2004) SVMTool: a general POS tagger generator based on support vector
machines. In: Proceedings of 4th international conference on language resources and evaluation
(LREC), Lisbon, pp 43–46
8. Beesley KR, Karttunen L (2003) Finite-state morphology: xerox tools and techniques. CSLI,
Stanford
9. Beesley KR (1998) Arabic morphology using only finite-state operations. In: Rosner M (ed)
Proceedings of the workshop on computational approaches to Semitic languages, COLING-
ACL’98, Montreal, pp 50–57
10. Kiraz GA (2000) Multitiered nonlinear morphology using multitape finite automata: a case study

13
on Syriac and Arabic. Comput Linguist 26(1):77–105
11. Cohen-Sygal Y, Wintner S (2006) Finite-state registered automata for non-concatenative
morphology. Comput Linguist 32(1):49–82
12. Amsalu S, Gibbon D (2005) A complete finite-state model for Amharic morphographemics. In:
Yli-Jyrä A, Karttunen L, Karhumäki J (eds) FSMNLP. Lecture notes in computer science, vol
4002. Springer, Berlin/New York, pp 283–284
13. Karttunen L, Beesley KR (2001) A short history of two-level morphology. In: Talk given at the
ESSLLI workshop on finite state methods in natural language processing. http://www.
helsinki.fi/esslli/evening/20years/twol-history.html
14. Koskenniemi K (1983) Two-level morphology: a general computational model for word-form
recognition and production. The Department of General Linguistics, University of Helsinki
15. Choueka Y (1966) Computers and grammar: mechnical analysis of Hebrew verbs. In:
Proceedings of the annual conference of the Israeli Association for Information Process- ing,
Rehovot, pp 49–66. (in Hebrew)
16. Forsberg M, Ranta A (2004) Functional morphology. In: Proceedings of the ninth ACM SIGPLAN
international conference on functional programming (ICFP’04), Snowbird. ACM, New York, pp
213–223
17. Fissaha S, Haller J (2003) Amharic verb lexicon in the context of machine translation. In:
Proceedings of the TALN workshop on natural language processing of minority languages, Batz-sur-
Mer

Linguistics Study Materials - Nov 15
No ratings yet
Linguistics Study Materials - Nov 15
161 pages
Lecture 02
No ratings yet
Lecture 02
44 pages
Morphology 9081
No ratings yet
Morphology 9081
43 pages
Monografia Maria
0% (1)
Monografia Maria
71 pages
02 - Morphological Analysis
100% (1)
02 - Morphological Analysis
17 pages
Lecture 02
No ratings yet
Lecture 02
44 pages
Seminar Guidline
No ratings yet
Seminar Guidline
13 pages
NLP Unit-I-1
No ratings yet
NLP Unit-I-1
84 pages
Lecture 2 LinguisticPreliminaries
No ratings yet
Lecture 2 LinguisticPreliminaries
65 pages
Interaction Between Morphology and Semantics
100% (1)
Interaction Between Morphology and Semantics
8 pages
NLP Unit 1
No ratings yet
NLP Unit 1
18 pages
Apricelia Amanda Putri - Scene 2 EMS Sem 116-2022 II
No ratings yet
Apricelia Amanda Putri - Scene 2 EMS Sem 116-2022 II
7 pages
Chapter 1
No ratings yet
Chapter 1
41 pages
Untitled Document
No ratings yet
Untitled Document
26 pages
NLP Merged
No ratings yet
NLP Merged
52 pages
Book of English Lexicology
No ratings yet
Book of English Lexicology
212 pages
XxUltra -UNDERSTANDING BASIC MORPHOLOGY- - ỌLÁÒNIPẸ̀KUN ỌLÁŃREWÁJÚ HOLINESS (Ultra), Department of linguistics African and Asian studies, (Yorùbá unit), University of Lagos
No ratings yet
XxUltra -UNDERSTANDING BASIC MORPHOLOGY- - ỌLÁÒNIPẸ̀KUN ỌLÁŃREWÁJÚ HOLINESS (Ultra), Department of linguistics African and Asian studies, (Yorùbá unit), University of Lagos
36 pages
Unit 1 NLP
No ratings yet
Unit 1 NLP
11 pages
Lecture 02
No ratings yet
Lecture 02
44 pages
Brown Vintage Illustrative Watercolor Sunday Sermon Church Presentation
No ratings yet
Brown Vintage Illustrative Watercolor Sunday Sermon Church Presentation
17 pages
Understanding Morphology in Linguistics
No ratings yet
Understanding Morphology in Linguistics
10 pages
NLP 2
No ratings yet
NLP 2
29 pages
Morphology for Linguistics Students
No ratings yet
Morphology for Linguistics Students
33 pages
Chapter 2 - Morphological Analysis
No ratings yet
Chapter 2 - Morphological Analysis
24 pages
Quiz UTS Morphology
90% (10)
Quiz UTS Morphology
8 pages
NLP: Morphological Analysis Lecture
No ratings yet
NLP: Morphological Analysis Lecture
23 pages
Final Examination General Instruction
No ratings yet
Final Examination General Instruction
8 pages
Words and Sentences
100% (3)
Words and Sentences
163 pages
Morphemes - Types and Elaboration
No ratings yet
Morphemes - Types and Elaboration
12 pages
02 - Morphological Analysis
No ratings yet
02 - Morphological Analysis
17 pages
MORPHOLOGY
No ratings yet
MORPHOLOGY
8 pages
Final Test (Uas) : English Morphologi
No ratings yet
Final Test (Uas) : English Morphologi
20 pages
Adil Juma's Summary
No ratings yet
Adil Juma's Summary
11 pages
False Friends in English for Spanish Students
No ratings yet
False Friends in English for Spanish Students
6 pages
Morphological Analysis
No ratings yet
Morphological Analysis
3 pages
Morphological Processes: Word Formation
No ratings yet
Morphological Processes: Word Formation
16 pages
How To Plan & Organize Work Activities
100% (2)
How To Plan & Organize Work Activities
2 pages
Marzan Et Al (2014) Correctness of Verb Inflection Use Coding Aspect and Tense by Five 14 To 44 Month-Old Filipino Children
No ratings yet
Marzan Et Al (2014) Correctness of Verb Inflection Use Coding Aspect and Tense by Five 14 To 44 Month-Old Filipino Children
52 pages
Chapter 3 Types of Morphemes
No ratings yet
Chapter 3 Types of Morphemes
26 pages
Giao Trinh M-S Cho SV
No ratings yet
Giao Trinh M-S Cho SV
95 pages
Morphological Systems
100% (2)
Morphological Systems
12 pages
Makalah Free Morphemes
No ratings yet
Makalah Free Morphemes
7 pages
Ch3 Morphology TK
No ratings yet
Ch3 Morphology TK
35 pages
Morphology of English
No ratings yet
Morphology of English
9 pages
B. A. Part-I Compulsory English Semester I
No ratings yet
B. A. Part-I Compulsory English Semester I
123 pages
Leslau 1987 - Comparative Dictionary of Geez
No ratings yet
Leslau 1987 - Comparative Dictionary of Geez
860 pages
Morphology
No ratings yet
Morphology
19 pages
Chapter Six Java Web Technology
No ratings yet
Chapter Six Java Web Technology
44 pages
Linguistic Morphology Overview
No ratings yet
Linguistic Morphology Overview
32 pages
C++ Functions: Predefined & User-Defined
No ratings yet
C++ Functions: Predefined & User-Defined
102 pages
PLAN and Org
No ratings yet
PLAN and Org
20 pages
Normalization
No ratings yet
Normalization
27 pages
Chapter Two Streams and File I/O
No ratings yet
Chapter Two Streams and File I/O
51 pages
9077.docx 9077 Compressed
No ratings yet
9077.docx 9077 Compressed
21 pages
English vs. Vietnamese Conversion
No ratings yet
English vs. Vietnamese Conversion
15 pages
Intro to C and C++ Programming
No ratings yet
Intro to C and C++ Programming
53 pages
Looping Concepts in Programming
No ratings yet
Looping Concepts in Programming
54 pages
Unit One: Introduction To Programming
No ratings yet
Unit One: Introduction To Programming
43 pages
Final Handouts of History of English Language
No ratings yet
Final Handouts of History of English Language
166 pages
MORPHOLOGY
No ratings yet
MORPHOLOGY
5 pages
Connect Hardware Peripherals Guide
No ratings yet
Connect Hardware Peripherals Guide
25 pages
Nouns and Verbs in Tagalog PDF
No ratings yet
Nouns and Verbs in Tagalog PDF
96 pages
Morphology Resume
No ratings yet
Morphology Resume
9 pages
Lesson Plan in Affixes Jerimiah 8
No ratings yet
Lesson Plan in Affixes Jerimiah 8
4 pages
Morphology
No ratings yet
Morphology
8 pages
Elt Lesson Plan
No ratings yet
Elt Lesson Plan
6 pages
Handout 2 - Introduction To SQL Server
100% (1)
Handout 2 - Introduction To SQL Server
6 pages
Computational Morphology Insights
No ratings yet
Computational Morphology Insights
12 pages
Vocabulary Acquisition With Affixation: Learning English Words Based On Prefixes & Suffixes Cholo Kim
No ratings yet
Vocabulary Acquisition With Affixation: Learning English Words Based On Prefixes & Suffixes Cholo Kim
38 pages
Word Stress
No ratings yet
Word Stress
4 pages
MORPHOLOGY
No ratings yet
MORPHOLOGY
11 pages
محاضرات مادة اللغة المرحلة الرابعة3
No ratings yet
محاضرات مادة اللغة المرحلة الرابعة3
30 pages
05 Morphology PDF
No ratings yet
05 Morphology PDF
8 pages
Operation Sheet 2 Normalizing An Example Table
No ratings yet
Operation Sheet 2 Normalizing An Example Table
3 pages
Fundamentals of Database System: Learning Guide # 3
No ratings yet
Fundamentals of Database System: Learning Guide # 3
47 pages
Unit 7 (C++) - Structures
No ratings yet
Unit 7 (C++) - Structures
44 pages
Fundamentals of Database System: Learning Guide # 1
No ratings yet
Fundamentals of Database System: Learning Guide # 1
45 pages
Unit Four: Flow of Control of Program
No ratings yet
Unit Four: Flow of Control of Program
40 pages
Unit Three: Operators, Expressions and Assignment Statement
No ratings yet
Unit Three: Operators, Expressions and Assignment Statement
34 pages
Words & Transducers
No ratings yet
Words & Transducers
7 pages
Morphology: The Words of Language
No ratings yet
Morphology: The Words of Language
14 pages
Lesson 1
No ratings yet
Lesson 1
9 pages
Document
No ratings yet
Document
6 pages
12 - An Analysis of Derivational Suffixes Found in The Fault in Our Stars Novel by John Green - Ni Made Ayu Purnama Dewi
No ratings yet
12 - An Analysis of Derivational Suffixes Found in The Fault in Our Stars Novel by John Green - Ni Made Ayu Purnama Dewi
61 pages
лекция 3 курс
No ratings yet
лекция 3 курс
57 pages
Morphological Analysis
No ratings yet
Morphological Analysis
3 pages
Chapter Five Remote Method Invocation (RMI)
No ratings yet
Chapter Five Remote Method Invocation (RMI)
47 pages
Germanic Language Exam Insights
No ratings yet
Germanic Language Exam Insights
7 pages
Enables in A Program To Be Executed: Multiple Threads - Multithreading
No ratings yet
Enables in A Program To Be Executed: Multiple Threads - Multithreading
48 pages
Handout 5 - Working With Tables in SQL
No ratings yet
Handout 5 - Working With Tables in SQL
10 pages
Alemayehu Yilma
No ratings yet
Alemayehu Yilma
67 pages
Chapter Five Networking in Java
No ratings yet
Chapter Five Networking in Java
44 pages
Introduction to Linguistics Overview
No ratings yet
Introduction to Linguistics Overview
62 pages
Functions (Modules) : Introduction To Programming I Chapter - 4
No ratings yet
Functions (Modules) : Introduction To Programming I Chapter - 4
10 pages
Back Up and Model Test
No ratings yet
Back Up and Model Test
4 pages
Lesson 5 Morphology
No ratings yet
Lesson 5 Morphology
31 pages
Morphology: Key Concepts & Types
No ratings yet
Morphology: Key Concepts & Types
16 pages
Gorontalo vs Arabic Morphology Analysis
No ratings yet
Gorontalo vs Arabic Morphology Analysis
9 pages
Handout 7 - Lecture Notes
No ratings yet
Handout 7 - Lecture Notes
1 page
Morphology
No ratings yet
Morphology
37 pages
Linguistic I Task Icut
No ratings yet
Linguistic I Task Icut
15 pages
CONTOH MIDTERM PROJECT - ENGLISH LINGUISTICS - LEXIS (New)
No ratings yet
CONTOH MIDTERM PROJECT - ENGLISH LINGUISTICS - LEXIS (New)
13 pages
Free Bound Lexical Grammatical Content Function: Unhappiness Un - Ness Happy Sing-Sang
No ratings yet
Free Bound Lexical Grammatical Content Function: Unhappiness Un - Ness Happy Sing-Sang
5 pages
English Suffixes Stress Assignment Properties Productivity Selection and Combinatorial Processes 1st Edition Ives Trevian
100% (1)
English Suffixes Stress Assignment Properties Productivity Selection and Combinatorial Processes 1st Edition Ives Trevian
72 pages
GNS 111 (President)
No ratings yet
GNS 111 (President)
6 pages
Handout 3: Morphology II
No ratings yet
Handout 3: Morphology II
2 pages
Ms. Ariani Morphology
No ratings yet
Ms. Ariani Morphology
5 pages
Thesis Review On Morophological Analyzer For Geez Verbs
No ratings yet
Thesis Review On Morophological Analyzer For Geez Verbs
13 pages
Morfologia: Parts of Speech
No ratings yet
Morfologia: Parts of Speech
5 pages
Morphology Analysis
No ratings yet
Morphology Analysis
3 pages
DateTime Functions for Students
No ratings yet
DateTime Functions for Students
1 page
Morphology: National University of San Martin
No ratings yet
Morphology: National University of San Martin
10 pages
Morphology
No ratings yet
Morphology
2 pages

Morphological Processing of Semitic Languages

Uploaded by

Morphological Processing of Semitic Languages

Uploaded by

Adama Science and Technology University

Department of Computer Science and Engineering

Course Title: Natural Language Processing

Individual Assignment Two (Book Chapter)

Submitted to Dr. Beharu

Submitted By: Abdi Mosisa

Morphological Processing of Semitic Languages

Here's a breakdown of key concepts in NLP morphology:

ሰበረዉ), and infixes (-በ- in ሰባበረ).

2.2 Basic Notions

one cannot derive አፍቃሪ from ፍቅር.

In addition to the unique root-and-pattern morphology, Semitic languages are characterized by a

• አዲስ ቤት - new house

• አዲስ ቤትዎቸ - new houses

• ብልጥ - smart (male)

• ብልጥት -smart (female)

2.3 The Challenges of Morphological Processing

Morphological processing is a crucial component of many natural language processing (NLP)

An alternative solution would be a dedicated morphological analyzer, implementing the

2.4 Computational Approaches to Morphology

2.4.1 Two-Level Morphology

2.4.2 Registered Automata

2.4.3 Analysis by Generation

2.4.4 Functional Morphology

Functional morphology [16] is a computational framework for defining language resources, in

2.4.5 Morphological Analysis and Generation of Semitic Languages

2.4.6 Related Applications

When downstream applications are considered, such as chunking, parsing, or machine

2.5 Morphological Disambiguation of Semitic Languages

You might also like