Amharic-Oromo Translation Study
Amharic-Oromo Translation Study
This is to certify that the thesis prepared by Gelan Tulu, titled: Bidirectional
Amharic-Afaan Oromo Machine Translation Using Hybrid Approach and submitted
in partial fulfillment of the requirements for the Degree of Master of Science in
Computer Science complies with the regulations of the University and meets the
accepted standards with respect to originality and quality.
Machine translation is the area of Natural Language Processing (NLP) that focuses on obtaining a
target language text from a source language text by means of automatic techniques. Machine
translation is a multidisciplinary field and the challenge has been approached from various points
of view including linguistic and statistics. Hybrid methods focus on combining the best properties
of two or more machine translation approaches. Nowadays, it has become very popular to include
rules in statistical machine translation approaches.
In this study, a bidirectional Amharic-Afaan Oromo machine translation system using hybrid
approach has been developed. The system has four components: sentence reordering, language
model, decoding and translation model. The sentence reordering is used to pre-process the
structure of the source language to be more similar to the structure of the target language by using
their Part of Speech (POS) tagging and to better guide the statistical engine. Since there are no
publicly available POS tagger tools for both Amharic and Afaan Oromo languages, tagged corpus
is prepared manually. The linguistic background and nature of the two languages have been studied
in order to design the reordering rules for different types of Amharic/Afaan Oromo phrases and
sentences. Language models by using IRSTLM tool and translation models by using GIZA++ have
been developed for Afaan Oromo and Amharic languages because the system is bidirectional. A
decoder has been used to find the best translation in the target language (Amharic/Afaan Oromo)
for a given source language (Afaan Oromo/Amharic) based on the translation and language
models.
To check the accuracy of the system, two experiments were conducted using two different
approaches. The first experiment is conducted by using a statistical approach to translate Amharic
to Afaan Oromo and vice versa and has a BLEU score of 89.39% and 80.33% respectively. The
second experiment is carried out by using a hybrid approach and has a BLEU score of 91.56% and
82.24% for Amharic to Afaan Oromo and Afaan Oromo to Amharic translation respectively. The
result shows that the hybrid approach is slightly better than the statistical approach.
My deepest gratitude goes to my family, especially my mother Tsige Demissie, for their
unconditional love and endless motivation during the course of my study. Finally, all my friends
deserve special thanks.
Table of Contents
List of Tables ............................................................................................................................................... iv
List of Figures ............................................................................................................................................... v
List of Algorithms ........................................................................................................................................ vi
Acronyms and Abbreviations ..................................................................................................................... vii
CHAPTER ONE: INTRODUCTION ........................................................................................................... 1
1.1 Background ................................................................................................................................... 1
1.2 Motivation ..................................................................................................................................... 2
1.3 Statement of the Problem .............................................................................................................. 2
1.4 Objective of the Study................................................................................................................... 3
1.5 Methods of the Study .................................................................................................................... 3
1.6 Application of Results................................................................................................................... 5
1.7 Scope and Limitation of the Study ................................................................................................ 5
1.8 Organization of the Thesis ............................................................................................................ 5
CHAPTER TWO: LITERATURE REVIEW ............................................................................................... 6
2.1 Introduction ................................................................................................................................... 6
2.2 A Brief Overview of Amharic Language ...................................................................................... 6
2.2.1 Word Categories of Amharic ................................................................................................ 8
2.2.2 Amharic Phrasal Categories ................................................................................................ 13
2.2.3 Amharic Morphology .......................................................................................................... 15
2.2.4 Amharic Sentence Structure................................................................................................ 16
2.3 A Brief Overview of Afaan Oromo ............................................................................................ 17
2.3.1 Word Categories of Afaan Oromo ...................................................................................... 18
2.3.2 Afaan Oromo Phrasal Categories ........................................................................................ 25
2.3.3 Afaan Oromo Morphology.................................................................................................. 26
2.3.4 Afaan Oromo Sentence Structure........................................................................................ 27
2.4 Machine Translation ................................................................................................................... 28
2.4.1 Rule Based Machine Translation Approach ....................................................................... 28
2.4.2 Corpus Based Machine Translation Approach ................................................................... 32
2.4.3 Hybrid Machine Translation Approach .............................................................................. 37
2.4.4 Neural Machine Translation................................................................................................ 37
i|Page
2.5 Evaluation of Machine Translation ............................................................................................. 38
CHAPTER THREE: RELATED WORK ................................................................................................... 42
3.1 Overview ..................................................................................................................................... 42
3.2 Machine Translation Systems for Non-Ethiopian Language Pairs ............................................. 42
3.3 Machine Translation Systems for English and Ethiopian language pairs ................................... 43
3.4 Machine Translation System for Ethiopian Language pair......................................................... 44
3.5 Summary ..................................................................................................................................... 44
CHAPTER FOUR: BIDIRECTIONAL AMHARIC-AFAAN OROMO MACHINE TRANSLATION
SYSTEM ..................................................................................................................................................... 45
4.1 Introduction ................................................................................................................................. 45
4.2 Architecture of the System.......................................................................................................... 45
4.2.1 Sentence Reordering ........................................................................................................... 47
4.2.2 Language Model ................................................................................................................. 68
4.2.3 Translation Model ............................................................................................................... 68
4.2.4 Decoding ............................................................................................................................. 69
CHAPTER FIVE: EXPERIMENT AND DISCUSSION ........................................................................... 70
5.1 Introduction ................................................................................................................................. 70
5.2 Corpus Preparation...................................................................................................................... 70
5.3 Experiment I................................................................................................................................ 70
5.3.1 Training the system ............................................................................................................. 70
5.3.2 Result of Test Set on Experiment I ..................................................................................... 71
5.4 Experiment II .............................................................................................................................. 71
5.4.1 Training the system ............................................................................................................. 72
5.4.2 Result of Test Set on Experiment II .................................................................................... 72
5.5 Discussion ................................................................................................................................... 72
CHAPTER SIX: CONCLUSION AND FUTURE WORK ........................................................................ 73
6.1 Introduction ................................................................................................................................. 73
6.2 Conclusion .................................................................................................................................. 73
6.3 Contribution ................................................................................................................................ 74
6.4 Future Work ................................................................................................................................ 74
References ................................................................................................................................................... 75
Annex I: Sample Amharic and Afaan Oromo Tagged Sentences for Training .......................................... 81
ii | P a g e
Annex II: Sample Parallel Corpus for Testing ............................................................................................ 83
Annex III: Sample language model for Amharic ........................................................................................ 85
Annex IV: Sample language model for Afaan Oromo................................................................................ 88
Annex V: Transliteration from Amharic alphabets to Latin characters ...................................................... 91
iii | P a g e
List of Tables
Table 2.1: Amharic Vowels ............................................................................................................ 6
Table 2.2: Amharic plural noun formation using suffix ................................................................. 8
Table 2.3: Nouns derived from other nouns ................................................................................... 9
Table 2.4: Amharic personal pronouns ........................................................................................... 9
Table 2.5: Amharic possessive personal pronouns. ...................................................................... 11
Table 2.6: Afaan Oromo plural noun formation using suffix ....................................................... 19
Table 2.7: Afaan Oromo personal pronouns ................................................................................. 20
Table 2.8: Afaan Oromo possessive personal pronouns ............................................................... 20
Table 2.9: Afaan Oromo demonstrative pronouns ........................................................................ 21
Table 2.10: Different forms of root ‘deem’[20]............................................................................ 22
Table 2.11: Adjectives inflection for gender ................................................................................ 24
Table 4.1: Amharic and Afaan Oromo POS tag sets .................................................................... 46
iv | P a g e
List of Figures
Figure 2.1: Amharic Alphabet ........................................................................................................ 7
Figure 2.2: Vauquois Triangle [38] .............................................................................................. 29
Figure 2.3: Direct machine translation approach [39] .................................................................. 30
Figure 2.4: Interlingua-based RBMT [40] .................................................................................... 31
Figure 2.5: SMT Architecture [42] ............................................................................................... 32
Figure 2.6: The Vauquois Triangle Modified for EBMT [47]...................................................... 37
Figure 4.1: Architecture of the System ......................................................................................... 45
Figure 4.2: An example of phrase-based translation .................................................................... 69
v|Page
List of Algorithms
Algorithm 4.1: Algorithm for reordering compound words ......................................................... 50
Algorithm 4.2: Algorithm for reordering noun phrases ................................................................ 53
Algorithm 4.3: Algorithm for reordering adjective words ............................................................ 54
Algorithm 4.4: Algorithm for reordering Amharic sentences containing a noun phrase and a
compound word ............................................................................................................................ 55
Algorithm 4.5: Algorithm for reordering Afaan Oromo sentences containing a noun phrase and a
compound word ............................................................................................................................ 56
Algorithm 4.6: Algorithm for reordering Amahric sentences containing an adjective and a
compound word ............................................................................................................................ 57
Algorithm 4.7: Algorithm for reordering Afaan Oromo sentences containing an adjective and a
compound word ............................................................................................................................ 58
Algorithm 4.8: Algorithm for reordering possessive pronouns .................................................... 59
Algorithm 4.9: Algorithm for reordering cardinal numbers ......................................................... 61
Algorithm 4.10: Algorithm for reordering ordinary numbers ...................................................... 62
Algorithm 4.11: Reordering rule for noun phrases modified by adjectives ................................. 63
Algorithm 4.12: Reordering rule for sentences containing a possessive pronoun and a noun
phrase ............................................................................................................................................ 65
Algorithm 4.13: Reordering rule for sentences containing a cardinal number and a noun phrase 66
Algorithm 4.14: Reordering rule for sentences containing an ordinary number and noun
combination................................................................................................................................... 67
vi | P a g e
Acronyms and Abbreviations
BCE Before Common Era
CW Compound Word
LM Language Model
TM Translation Model
vii | P a g e
CHAPTER ONE: INTRODUCTION
1.1 Background
Machine translation is a branch of computational linguistics and is defined as an automatic process
by computerized system that converts a piece of text (written or spoken) from one natural language
referred to as a source language to another natural language called the target language with human
intervention or not and with the objective of restoring the meaning of the original text in the
translated text [1].
Machine translation systems can be designed either specifically for two particular languages,
called a bilingual system, or for more than a single pair of languages, called a multilingual system.
A bilingual system may be either unidirectional, from one source language into one target
language, or may be bidirectional. Multilingual systems are usually designed to be bidirectional,
but most bilingual systems are unidirectional.
Different approaches for machine translation were defined and gained maturity for practical use
today. The main approaches to building a machine translation tools are: knowledge driven
approach also known as Rule Based Machine Translation (RBMT), data driven machine
translation approach which is also known as Corpus Based Machine Translation (CBMT), hybrid
machine translation approach which combines the advantages of the RBMT and CBMT
approaches and Neural Machine Translation (NMT) which emerged as a successor of corpus based
machine translation.
RBMT generates output based on linguistic rules and language order, morphological, syntactic
and semantic analysis of both the source and the target language. The RBMT systems follow
various approaches for translation namely; direct approach, transfer approach and interlingua
approach [2]. However, RBMT techniques are less accurate due to the difficulty in incorporating
rule interaction in big systems, ambiguity and idiomatic expressions. The complexity of creating
RBMT system paved way for developing other machine translation approaches like corpus based
machine translation and hybrid machine translation.
CBMT requires huge amount of parallel corpus to ensure translation of the source language
sentences to the target language sentences. The two major categories of CBMT are Statistical
Machine Translation (SMT) and Example Based Machine Translation (EBMT). SMT uses parallel
corpus to calculate the order of words in both the source and target languages using mathematical
1|Page
statistical probability. EBMT systems use the sample sentences stored in the database for
translation of new sentences.
The Hybrid approach of machine translation utilizes properties of RBMT and SMT. Some Hybrid
systems use a rule based approach followed by correction of output using statistical information.
On the other hand, in some Hybrid systems statistical preprocessing is done followed by correction
using transfer rules.
CBMT systems fail to provide accurate translations between language pairs with significant
grammatical differences. Thus, the emerging research in machine translation has turned towards
Neural Machine Translation (NMT). Neural machine translation is a new architecture that aims at
building a single neural network that can be jointly turned to maximize the translation
performance. This neural network is trained using deep learining techiniques. NMT requires a very
large number parallel corpus to train the network. This requirement hinders the applicability of
NMT for language pairs that lack huge parallel corpus.
1.2 Motivation
Ethiopia has more than 80 languages spoken within the country. Amharic and Afaan Oromo are
the two principal languages spoken in the country [3]. Due to a large number of speakers of
Amharic and Afaan Oromo, need of translations from Amharic to Afaan Oromo and vice versa is
highly increasing from time to time. This motivated us to study and investigate the development
of bidirectional Amharic – Afaan Oromo machine translation system.
2|Page
the best of the researcher’s knowledge there is no machine translation study conducted on
Amharic-Afaan Oromo language pairs. With the fact that Amharic and Afaan Oromo are widely
used in media, industries and offices, there is a huge electronic data available in both languages.
These data would be valuable if they can be used by both language speakers. This calls for the
development of bidirectional Amharic-Afaan Oromo translation system.
This study was attempted to answer the research question: What is the possible machine translation
approach to overcome linguistic barriers and to address the knowledge among Amharic language
and Afaan Oromo language speakers and users?
To study syntactic structure and relationship of the language pair: Amharic and Afaan
Oromo.
Literature Review
Systems and applications that are related to bidirectional machine translation in different language
pair was reviewed. This consists of thesis, conference and journal articles, white papers and
bidirectional systems developed for other languages. In addition, a discussion was made with
3|Page
Amharic and Afaan Oromo language experts regarding the linguistic nature of the languages, like
the grammatical structure and morphology of the languages.
Data Collection
Amharic-Afaan Oromo parallel corpus was collected from Fana Broadcasting Corporate News 1,
some chapters of the Holy Bible and other simple sentences are used to perform the experiment.
A total of 1402 parallel sentences were collected, out of which 1301 are used for traning and the
rest parallel sentences i.e., 101 are used for testing.
Software Tools
For the development of bidirectional Amharic-Afaan Oromo machine translation prototype, the
following tools will be used:
- Ubuntu 16.04: a complete desktop Linux operating system which is freely available and
suitable for the Moses environment.
- Moses: a statistical machine translation system that allows to automatically train translation
models for any language pair.
- Giza++: a toolkit to train word alignment models.
- MKCLS: a tool to train word classes by using a maximum-likelihood-criterion.
- IRSTLM: a language modeling toolkit.
- BLEU Score: to evaluate the performance of the system.
- Notepad: to make the corpus in system understandable format.
- Microsoft Office 2013: software for the documentation of the study.
Evaluation
Machine translation evaluation could be done by using manual or automatic evaluation methods.
Manual evaluatin gives a better result in order to measure the quality of machine translation and
to analyze the errors within the system output. The most challenging issues in conducting human
evaluation of machine translation output are high costs and time consumption. Therefore automatic
methods like Bilingual Evaluation Understudy (BLEU) were proposed to measure the performance
of machine translation. We used BLEU score metrics to evaluate the performance of the prototype.
1
http://www.fanabc.com
4|Page
1.6 Application of Results
The following are the main applications of this research work:
The parallel corpus which is used for training and testing purpose in this work can be used
in other NLP applications such as, named entity recognition, cross language information
retrieval (CLIR) for Amharic – Afaan Oromo language pair.
The translation of different reading materials can easily be accomplished for Amharic –
Afaan Oromo language pair.
The translation system can be used as a tool in teaching and learning process of the
languages.
5|Page
CHAPTER TWO: LITERATURE REVIEW
2.1 Introduction
In this chapter, a brief overview of Amharic and Afaan Oromo languages and different machine
translation approaches are discussed. The major Amharic and Afaan Oromo word classes, which
are nouns, verbs, adjectives and adverbs are also discussed in this chapter.
Vowel ኧ/ ä/ ኡ/ u/ ኢ/ i/ ኣ/ a/ ኤ/ e/ እ/ ï/ ኦ/ o/
The Amharic script contains thirty four basic symbols. Each of the thirty four basic symbols has
seven symbols, one for each of the seven vowels of Amharic [11]. The Amharic syllabary is
presented in Figure 2.1.
6|Page
Figure 2.1: Amharic Alphabet
: is used to separate words. Nowadays, it is uncommon to see the punctuation mark ‘:’ in
Amharic electronic or paper based writings instead white spaces are used to demarcate
words.
፣ is used to separate comparative and sequential list of names, phrases, or numbers as well
as to separate parts of a sentence that are not complete by themselves.
፤ is used to separate equivalent main phrases in one idea. Even though it is not placed at
the end of a paragraph, it can be used to separate sentences with similar ideas in a
paragraph.
7|Page
? indicates an interrogative clause or phrase.
! is used to emphasize strong feelings and placed after a word or at the end of a sentence.
2.2.1.1 Nouns
Nouns are words that are used to identify names, things and places. A word is grouped under noun
if it inflects for the Amharic plural marker ‘-ኦች’ /‘-och’/ or ‘-ዎች’ [-woch], if it can be used as a
subject or an object in a sentence, is modified by adjectives and comes after demonstrative
pronouns [13].
Amharic plural nouns are mainly formed by adding suffixes: ‘-ኦች’ /‘-och’/ or ‘-ዎች’ /‘-woch’/.
Table 2.2 shows suffixes used to form plural nouns in Amharic.
Table 2.2: Amharic plural noun formation using suffix
Amharic nouns can be either primary or derived. They are derived if they are related in their root
consonants and/or meaning to verbs, adjectives or other nouns. Otherwise, they are primary [13,
14]. For example, the noun መንገድ /mängäd/ [street] is primary but, ‘መንገድ-ኧኛ’ መንገደኛ
8|Page
/mängädäNa/ [traveler] is derived from the nominal base መንገድ by adding the morpheme ‘-ኧኛ’.
Nouns can be derived from other nouns, adjectives, roots, stems and the infinitive form of a verb
by affixation and intercalation. The morphemes ‘-ነት’, ‘-ኧኛ’, ‘-ኧት’, ‘-ኣዊ’, ‘-ተኛ’, ‘-ኛ’ and the
prefix ‘ባለ-‘ are used to derive nouns from other nouns. Table 2.3 shows examples of nouns derived
from other base nouns.
Table 2.3: Nouns derived from other nouns
A word that can be used in place of a noun is called a pronoun. Pronouns can be categorized based
on their functions and meanings in the sentence. Amharic pronouns are categorized into personal
pronouns, reflexive pronouns, demonstrative pronouns and possessive pronouns [12].
Personal Pronouns
A personal pronoun is a word that is used as a simple substitute for the proper name of a person.
Amharic personal pronouns with equivalent English personal pronouns are shown in Table 2.3.
Table 2.4: Amharic personal pronouns
Within second-person and third-person singular, there are two additional polite independent
pronouns, for reference to people to whom the speaker wishes to show respect. The polite personal
pronouns in Amharic are እርስዎ [you, singular, polite] and እሳቸው [he/she, singular, polite].
9|Page
Reflexive Pronouns
Reflexive pronouns are words that are used when the subject and the object of a sentence are the
same. For example: እኔ በራሴ እተማመናለሁ /xne bärase etämamänalähu/ [I believe in myself].
The subject እኔ (I) and the object ራሴ (myself) indicate the same person.
A reflexive pronoun can also play the indirect object role in a sentence [12, 15]. For example:
አልማዝ ሁልጊዜ ጠዋት ጠዋት ለራሷ ሻይ ትቀዳለች. /xälmaz hulgize Täwat Täwat läraswa śay
tqädaläc/ [Almaz pours a cup of tea for herself every morning].
Amharic reflexive pronouns with equivalent English reflexive pronouns are as follows: እኔ ራሴ
/xne rase/ [myself], እሱ ራሱ /xsu rasu/ [himself], እሷ ራሷ /xswa raswa/ [herself], አንተ ራስህ /xäntä
rash/ [yourself, masculine, singular], አንቺ ራስሽ /xänchi räsś/ [yourself, feminine, singular], አንድ
ራሱ /xänd rasu / (oneself), እሱ ራሱ /xsu rasu/ [itself], እኛ ራሳችን /xNa rasachn/ [ourselves], እናንተ
Demonstrative Pronouns
A demonstrative pronoun is a pronoun that is used to point to something specific within a sentence.
Amharic makes a two way distinction between near ይህ/ይቺ /yh/yci/ [this], እነዚህ /xnäzih/ [these]
and far ያ /ya/ [that], ያቺ /yacï/ [that], እነዚያ /xnäziya/ [those] demonstrative expressions (pronouns,
adjectives, adverbs) and they can be either singular or plural. Amharic also distinguishes masculine
gender ይህ /yh/ [this], ያ /ya/ [that]/ and feminine gender ይቺ /ycï [this], ያቺ /yacï/ [that] in the
singular.
Possessive Pronouns
Possessive pronouns show possession or ownership in a sentence [12, 15]. In Amharic there are
two ways in which possession can be expressed. The first is through possessive suffixes. Amharic
has a set of morphemes that are suffixed to nouns, signaling possession. For example: ቤት (house)
10 | P a g e
ቤት-ኡ ቤቱ /betu/ [his house].
Morphemes -ኤ, -ኣችን, -ህ, -ሽ, -ኡ, -ኋ, -ኣቹ and -ኣቸው are affixed to the noun ቤት to indicate
possession my, our, your (masculine, singular), your (feminine, singular), his, her, your (plural)
and their respectively. The second way to express possession is through attaching prefix ‘የ-’ to the
Amharic personal pronouns. For example:
ያ መኪና የአንተ (ያንተ) ነው /ya mäkina yantä näw/ [That car is yours]. The possessive
የአንተ (ያንተ) መኪና እየመጣች ነው /yantä mäkina eyämäTac näw/ [Your car is coming].
Amharic possessive pronouns with their equivalent English possessive pronouns is shown in Table
2.5.
Table 2.5: Amharic possessive personal pronouns.
Interrogative sentences are sentences that can form a question. According to Getahun Amare [15],
the main interrogative pronouns used in Amharic are: ማን /man/ (who), ምን /mn/ [what], የት /yät/
[where], ስንት /snt/ [how much/ how many], መቼ /mäce/ [when], እንዴት /xndet/ [how], የትኛው
/yätNaw/ [which]. When the interrogative pronouns are combined with preposition, we can get
interrogative prepositional phrases ከማን /kämän/ [from who], ለምን /lamn/ [why], በምን /bamn/
[by what], ከየት /käyät/ [from where], የማነው /yämanäw/ [whose], etc.
11 | P a g e
2.2.1.2 Adjectives
Amharic adjectives modify nouns or pronouns by describing, identifying or quantifying words [12,
15]. Amharic adjectives always come before nouns or pronouns which they modify, but all the
words that come before nouns cannot always be adjectives [13]. As it is true for nouns, adjectives
can also be primary (such as ደግ /däg/ [kind], ፈጣን /fäTan/ [fast]) or derived. Adjectives are
derived from nouns, stems or verbal roots by adding a suffix or a prefix and by intercalation. For
example, it is possible to derive ድንጋይ-ኣማ ድንጋያማ /dngayama/ [stony] from the noun ድንጋይ
/dngay/ [stone]; ሀይል-ኧኛ ሀይለኛ /hayläNa/ [powerful] from the noun ሀይል /hayl/ [power];
ስኧንኧፍ ሰነፍ /sänäf/ [lazy] from the root ስንፍ /snf/; ክብኡር ክቡር /kbur/ [respectful] from
2.2.1.3 Verbs
A verb is a word that expresses action, state of being in or relationship between two things [16].
Amharic verbs take subject markers as a suffix like ‘-ሁ’ for subject ‘I’ as in መጣሁ /mäTahu/ [I
came], ‘-ህ’ for subject ‘you’ as in መጣህ /mäTah/ (you came), ‘-ች’ for subject ‘she’ as in መጣች
/mäTac/ [She came], and so on, to agree with subject of the sentence. Amharic verbs often have
additional morphology that indicate the person, number and (second person and third person
singular) gender of the object of the verb. For example: አንቺን አየሁሽ /xäncin xäyähuś/ [I saw
you], ‘-ሁሽ’ indicates second person, singular, feminine, and in the sentence አልማዝን አየኋት
/xälmazn xäyäwat/ [I saw Almaz] ‘-ኋት’ indicates third person, singular, feminine.
2.2.1.4 Adverbs
In Amharic, adverbs are used to modify the coming verbs. Adverbs always come before the
modified verb. Adverbs can be found either in their primitive form or compound form as grouping
of preposition and other word categories [13].
xälämämTatwan gäna xälwäsänäcm/ [She hasn’t yet decided if she wants to come or not], ገና
/gäna/ [yet] is the only adverb that formed the adverbial phrase.
12 | P a g e
2.2.1.5 Prepositions
Prepositions and postpositions together are called adpositions. A preposition or a postposition
typically combines with a noun or a pronoun or more generally a noun phrase, this being called its
complement. A preposition comes before its complement; a postposition comes after its
complement. In Amharic, adposition link one word with another word [12]. Amharic adpositions
are very few in number, these are: ስለ /slä/, እንደ /xnde/, ከወደ /käwädä/, አጠገብ /xäTägäb/, ማዶ
/mado/, ባሻገር /bashagär/, ወዲህ /wädih/. In Amharic, adpositions give meaning when they come
with other words. Consider the following phrases:
Prepositions ‘ስለ’, ‘እንደ’, ‘ከ’ and ‘እስከ’ comes before the nouns ‘ገንዘብ’, ‘ሰው’, ‘ወንዝ’ and
‘ጎጃም’ and postpositions ‘ማዶ’ and ‘ድረስ’ comes after the nouns ‘ወንዝ’ and ‘ጎጃም’.
children], ሁሉም (all) is a specifier, የምወዳቸው (my dear) is an adverbial modifier and ልጆቼ
(children) is a noun.
Verb Phrase: Amharic verb phrase is constructed with a verb as a head and other constituents
such as complements, modifiers and specifiers. For example: in the following Amharic verb
13 | P a g e
phrase, ከትምህርት ቤት መጣሁ /kätmhrt bet mäTahu/ [I came from school], ከትምህርት ቤት (from
Adjectival phrase: In Amharic adjective phrase, one or more words work together to give more
information about an adjective. For example: in the sentence, ወንድሜ በስራው በጣም ደስተኛ ነው
/wändme bäsraw bätam dästäNa näw/ [My brother is very happy with his work], ደስተኛ /dästäNa/
[happy] modifies the prepositional phrase በስራው /bäsraw/ [with his work].
Prepositional Phrase: Amharic prepositional phrase is made up of a preposition head and other
constituents such as nouns, noun phrases, etc. unlike other phrase constructions, a preposition
cannot be taken as a phrase, instead it should be combined with other constituents. Prepositions
link nouns, pronouns and phrases to other words in a sentence. Prepositions give meanings only if
they combine with other words such as noun, adjective, verb. For example: in the prepositional
phrase በወንበር ላይ /bäwänbär lay/ (on the chair) በ /bä/ and ላይ /lay/ are prepositions which are
Adverbial Phrases: Amharic adverbial phrases are made up of an adverb as head word and one
or more other lexical categories including adverbs themselves as modifiers, the head of the
adverbial phrase is placed at the end [13]. Unlike other phrases, adverbial phrases do not take
complements. Most of the time, the modifiers of the adverbial phrases are prepositional phrases
that come always before adverbs. Examples: ክፉኛ /kfuNa/ [severely], በጣም ክፉኛ /bäTam kfuNa/
[very severely], እንደ ወንድሙ በጣም ክፉኛ /xndä wändmu bäTam kfuNa/ [very severely like his
brother].
Conjunction Phrases: A conjunction is a part of speech that connects words with words, phrases
with phrases and sentences with sentences [12]. The primary types of conjunctions in Amharic are
coordinating conjunctions and subordinating conjunctions.
Coordinating conjunctions connect words, phrases, and clauses. A coordinating conjunctions give
equal emphasis or importance to clauses, phrases, and words. For example, consider the following
Amharic sentences.
አራት በጎች እና ሶስት ፍየሎች /xärat bägoc xna sost fyäloc/ [Four sheep and three goats].
14 | P a g e
ሁለት እንጀራ ወይም ሶስት ጠርሙስ ቢራ /hulät xnjära wäym sost Tärmus bira/ [Two Injera
or three bottles of beer]
‘እና’ /xna/ [and] and ‘ወይም’ /wäym/ [or] are coordinating conjunctions used to connect related
phrases.
Subordinating conjunctions connect two clauses together, but in doing so, they make one clause
dependent (or subordinate) upon the other clause (or main clause). For example, consider the
following simple sentences.
ጠራሁት ሆኖም አልመጣም /Tärahut honom xälmätam/ [I called him though he did not come].
ጠራሁት (I called him) and በልቻለሁ /bälcaläw/ (I ate) are the main clauses, አልመጣም /xälmätam/
(he did not come) and አልጠገብኩም /xältägäbkum/ (I didn’t satisfied) are the dependent clauses
and ሆኖም /honom/ [though] and ግን /gn/ [but] are subordinate conjunctions.
ውስድ is the root form for ወሰደ /wäsädä/ [take] and ተወሰደ /täwäsädä/ [taken].
Subject-Verb agreement
Amharic verbs agree with their subjects that is, the person, number and gender of the subject of
the verb (in the second and third person singular) are marked by suffixes or prefixes on the verb.
The affixes on the verb that signal subject agreement vary greatly with the particular verb tense,
aspect or mood. In Amharic sentence, the verb goes at the end of the sentence and the order is
Subject – Object – Verb (SOV) [18].
For example:
15 | P a g e
እሱ /xsu/ [He] is a subject, ‘ተማሪ’ /tämari/ [student] is an object and ‘ነው’ /näw/ [is] is a verb.
Amharic Articles
Indefinite articles are generally unmarked in Amharic, but definite articles are always marked by
a suffix called the definite marker [19]. For singular, a distinction is made between a noun treated
as masculine form, for example: ቤቱ /bet-u/ [his house] or as feminine form, ቤቷ /bet-wa/ [her
house], definite-female.
አበበ ትምህርት ቤት ሄደ /xäbäbä tmhrt bet hedä/ [Abebe went to school], ‘አበበ’ /xäbäbä/
[Abebe] is the subject, ‘ትምህርት ቤት’ /tmhrt bet/ [school] is the object and ‘ሄደ’ /hedä/ [went] is
the verb.
Simple Amharic sentences can also be constructed using a subject and a predicate.
For example: ‘ውሻው ሮጠ’ /wśaw roTä/ [the dog ran], ‘ውሻው’ /wśaw/ [the dog] is the subject of
the sentence, because the sentence is telling something about the dog. And what is it telling? It
says ውሻው ሮጠ /wśaw roTä/ [the dog ran], so the predicate is ሮጠ /roTä/ [ran].
Amharic sentences can also be constructed from simple or complex noun phrases and simple or
complex verb phrases. Simple sentences are constructed from simple noun phrase followed by
simple verb phrase which contains only a single verb. The following examples show the various
structures of simple sentences.
ማን መኪና ገዛልህ? /man mäkina gäzalh?/ [Who did buy a car for you?]
ሁለት ትልልቅ ልጆች በመኪና ወደ ጎጃም ሄዱ /hulät tllq ljocï bämäkina wädä gojam hedu/
[Two big children went to Gojjam by car.]
16 | P a g e
2.3 A Brief Overview of Afaan Oromo
Afaan Oromo is one of the major indigenous African languages that is widely spoken and used in
most part of Ethiopia and some parts of the neighboring countries [20]. Besides, Afaan Oromo has
long history of and well developed oral tradition. Despite of this and the size of its speakers as
well as its value as widely spoken language in the Horn of Africa, it remained as unwritten
language for a long period of time. The writing system of Afaan Oromo is called Qubee, a latin
alphabet [21, 22].
Afaan Oromo has five vowels, five double consonants and twenty consonant phonemes, i.e.,
sounds that make a difference in word meaning. Afaan Oromo vowels are represented by the
letters, a, e, o, u and i, or long vowels: aa, ee, oo, uu and ii. The length of the vowel makes a
difference in word meaning [22]. For example:
Laga [river] and Laagaa [roof of the mouth].
Lafa [ground] and Laafaa [soft]
Afaan Oromo double consonants are represented by the letters: Ch, Dh, Ny, Ph and Sh and the rest
consonants are represented by the letters: B, C, D, F, G, H, J, K, L, M, N, P, Q, R, S, T, V, W, X,
Y and Z [23].
Afaan Oromo words do not have the consonants ‘p’, ‘v’ and ‘z’, because there are no native Afaan
Oromo words that are formed from these characters [7, 23]. However, in writing Afaan Oromo
they are used to refer to foreign words such as police (poolisii) and virus (vaayirasii).
17 | P a g e
- : Colon is used to separate and introduce lists, clauses, and quotations, along with several
conventional uses.
- ; Semi colon is used to connect independent clauses. It shows a closer relationship between
the clauses than a period would show.
An apostrophe mark (‘) in Afaan Oromo is used to represent a glitch called hudhaa sound. It is
used to write the word in which most of the time two vowels appeared together like ba’e to mean
“get out” with the exception of some words like ja’a ‘six’, hin danda’amu ‘impossible’, which are
identified from the sound created. Sometimes apostrophe mark (‘) in Afaan Oromo
interchangeable with the spelling “h”. For instance, “ba’e”, “ja’a” can be interchanged by the
spelling “h” like “bahe”, “jaha” respectively still the senses of the words is not changed.
2.3.1.1 Noun
A noun is a part of speech that names a person, place, thing, idea, action or quality. In Afaan
Oromo, a noun (maqaa) mainly occurs at the beginning of a sentence. For example: Tolaan hucuu
adii bitate. (Tola bought white cloth). Tolaan (Tola) ‘name of person’ is a noun, comes at the
beginning of the sentence.
A word that is categorized as a noun in a sentence can be a subject or an object [25]. In Afaan
Oromo, a subject mostly comes at the beginning whereas an object mostly comes after subject and
before verbs in a sentence. For example: in the sentence, Tolaan mana ijare. (Tola built a house).
The noun ‘Tolaan’ (name of person) is the subject and the noun ‘mana’ (house) is the object.
18 | P a g e
Most Afaan Oromo nouns are marked for gender: masculine or feminine. Afaan Oromo nouns
derived from verbs adds suffix ‘–aa’ and ‘–tuu’ to the verb root for the masculine and feminine
gender respectively [26].
For example:
barsiisuu [to teach]: verb,
barsiisaa [teacher] masculine: noun;
barsiistuu [teacher] feminine: noun.
barachuu (to learn): verb,
barataa [student], masculine: noun;
barattuu [student], feminine: noun.
Afaan Oromo plural nouns are mainly formed by adding suffixes: ‘–oota’, ‘–ota’, ‘–wwan’, ‘–
een’, ‘–lee’ and ‘–yyi’ [24, 27]. Table 2.6 shows suffixes used to form plural nouns in Afaan
Oromo.
Table 2.6: Afaan Oromo plural noun formation using suffix
Pronoun
A pronoun is a word that can be used in place of a noun. Afaan Oromo pronouns can be categorized
based on their functions and meanings in the sentence [25]. These are personal pronouns,
possessive pronouns, reflexive pronouns and demonstrative pronouns. The descriptions of each
Afann Oromo pronouns is presented in this section.
19 | P a g e
Personal Pronouns
Afaan Oromo personal pronouns refer to the person speaking, the person spoken to or the person
spoken about. For example, in the following sentences,
Isheen kitaaba dubbifte. (She read a book).
Inni ishee jaalata. (He likes her).
Nuti isa binna. (We buy it).
Isheen (she), inni (he) and nuti (we) are personal pronouns. Table 2.7 illustrates Afaan Oromo
personal pronouns that can be used in the subject positions.
Table 2.7: Afaan Oromo personal pronouns
Possessive Pronouns: Possessive pronouns are pronouns that indicate ownership of something.
For example:
Re’een suni tiyya. (That goat is mine).
Konkolaataan sun keessani. (That car is yours).
‘tiyaa’ (mine) and keessani (yours) are possessive pronoun. Table 2.8 below shows Afaan Oromo
possessive pronouns that can be used in the subject positions.
Table 2.8: Afaan Oromo possessive personal pronouns
Afaan Oromo possessive case can also be formed by prefixing ‘kan’. For example: kan koo (mine),
kan keenya (ours), kan isaa (his), kan ishee (hers), kan kee (yours), kan isaanii (theirs).
20 | P a g e
Reflexive Pronouns
According to Getachew Mamo and Million Meshesha [25], Afaan Oromo has two ways of
expressing reflexive pronouns (myself, ourselves, yourself, yourselves, himself, herself and
themselves). One is to use the noun meaning ‘self’: of(i) or if(i). This noun is inflected for case
but, unless it is being emphasized, not for person, number, or gender.
For example:
Isheen of laalti. (base form of of). (She looks at herself).
Isheen ofiif konkolaataa bitte. (dative of of). (She bought a car for herself).
The other possibility is to use ‘mataa’, with possessive suffixes. For example: mataa koo (myself),
mataa kee (yourself , singular).
Afaan Oromo has a reciprocal pronoun wal (each other) that is used like of/if. It is inflected for
case but not for person, number, or gender.
For example:
Wal jaalatu. (They like each other).
Kennaa walii bitan. (They bought gifts each other).
Demonstrative Pronouns
Afaan Oromo makes a two-way distinction between proximal (‘this, these’) and distal (‘that,
those’) demonstrative pronouns and adjectives [25, 22]. Proximal pronouns have masculine and
feminine gender whereas distal pronouns do not have. However, singular and plural demonstrative
pronouns are not distinguished. Table 2.9 shows Afaan Oromo demonstrative pronouns.
Table 2.9: Afaan Oromo demonstrative pronouns
In Afaan Oromo interrogative sentences are used to form a question. According to Jabesa Daba
and Yaregal Assabie [7], the main Afaan Oromo interrogative pronouns are: maal(i) (ምን, what),
maaliif(i) (why), akkam(i) (how), yoom (መቼ, when), eessa (የት, where), eessaa (from where),
21 | P a g e
eenyu (ማን, who, what), kan eenyu (whose), meeqa (ስንት, ምን ያህል, how much, how many),
kam(i) (which).
2.3.1.2 Verb
A verb (xumura) is a word that express action, state of being in or relationship between two things
[16]. In Afaan Oromo verbs mostly appear at the end of a sentence [22]. For example: Turaan
wayaa adii bitate. (Tura bought white cloth). Bitate (bought) is the verb of the sentence.
Like Amharic, Afaan Oromo verbs can be modified to indicate person, gender, tense and number
[20, 25, 22]. The prefixes and suffixes for person, gender, tense and number are essentially
identical in all forms. For example, root ‘deem-’ has the basic meaning of ‘waking’. The root may
be conjugated in simple past, present, continuous and perfect tense, in singular and plural forms
as shown in Table 2.10.
Table 2.10: Different forms of root ‘deem’[20]
Most Afaan Oromo verbs are in their infinitive form, for example, beekuu (to know). The verb
stem ‘beek-’ is the infinitive form ‘beekuu’ with the final ‘–uu’ dropped. Afaan Oromo verbs can
be categorized into main (transitive or intransitive) and auxiliary verbs [22].
Transitive verbs are main verbs which transfer message to complements or objects. Consider the
following examples:
Tolaan bishaan waraabe. [Tola fetch water].
Tolaan ulee cabse. [Tola broke a stick].
22 | P a g e
Each of the verbs, waraabe [fetch] and cabse [broke] in these sentences have objects that complete
the verbs’ actions.
Intransitive verbs are main verbs which do not take object or complement in a sentence. For
example: in the sentence, Ijoolleen rafan (Children slept), it is impossible for an object to follow
the verb rafan (slept).
Auxiliary verbs support the main verbs used in a sentence, add functional or grammatical meaning
to the clauses in which they appear. For example:
Tolaan kaleessa ganama fiigaa ture. [Tola was running yesterday morning.]
Yeroo obboleessi koo naaf bilbilu, ani rafeen ture. [I was sleeping when my brother called
me.]
Taphni ijoolleef faayidaa baay'ee qaba. [playing has many advantages for childrens.]
In the above sentences the words ‘ture’ and ‘qaba’ are auxiliary verbs. The following are Afaan
Oromo auxiliary verbs ‘dhaa’, ‘ta`e’, ‘qaba’, ‘ture’, ‘jira’, etc.
Like Amharic, Afaan Oromo verbs take subject markers such as ‘-e’, ‘-ine’, ‘-ite’ and ‘-ani’ for
subjects I, we, she and they respectively to agree with the subject of the sentences, as shown in the
following examples:
Ani isa gorse. [I advised him.]
Nu`i isa gorsine. [We advised him.]
Isheen isa gorsite. [She advised him.]
Isaan isa gorsani. [They advised him.]
2.3.1.3 Adverb
Adverbs are words which modify verbs and adjectives. Adverbs could be categorized as adverbial
time, adverbial place and adverbial condition [25]. In Afaan Oromo adverbs precede verbs they
modify. For example:
Isheen baayee furdaada. [She is very fat.], baayee [very] indicates the degree how fat she
is.
Isheen amma dufte. [She came now.], ‘amma’/ [now] is a time adverb.
Toolaan baayee deeraada. [Tola is very tall.], the adverb baayee modifies the verb deeraa.
23 | P a g e
Some common Afaan Oromo adverbs are: amma (now), kaleessa (yesterday), harr’a (today), edana
(tonight), bor (tomorrow), dhiyootti (soon), dafee (quickly), suuta (slowly), walii wajjin (together),
baayee (very), yeroo hunda (always), yeroo baayyee (usually), gaaffii gaaf (sometimes), darbee
(rarely), matuma (never).
2.3.1.4 Adjective
In Afaan Oromo adjectives (addeessa) come after the nouns they qualify. For example: in the
following adjectival phrases, uffata adii (white cloth) and muka gabaabaa (short stick), adii (white)
and gabaabaa (short) are adjectives that qualifies the nouns uffata and muka respectively.
Afaan Oromo adjectives can be marked for gender, by the presence of gender markers ‘-cca’, ‘-
aa’, etc for masculine and ‘-ttii’, ‘-tuu’, ‘-oo’, etc for feminine [22]. Table 2.10 presents inflection
of adjectives for gender.
Table 2.11: Adjectives inflection for gender
gurraacca (black) gurraacca (by affixing –cca) gurraattii (by affixing –ttii)
deeraa (tall ) deeraa (by affixing –aa) deertuu (by affixing –tuu)
furdaa (fat) furdaa (by affixing -aa) furdoo (by affixing -oo)
2.3.1.5 Adposition
Prepositions and postpositions together are called adpositions. Adpositions are class of words used
to express spatial or temporal relations [24]. A preposition comes before its complement; a
postposition comes after its complement. Consider the following examples.
Toolaan waaye ofisaa dubbaccuu jaalata (Tola likes to talk about himself). ‘waaye’ (about)
occurs preceding the nominal ‘ofisaa’.
Toolaan abbaasaa wajjin dhufe. (Tola came with his father). ‘wajjin’ (with) occurs after
the nominal ‘abbaasaa’.
Some common prepositions are: gara (towards), eega, erga (since, from, after), haga, hanga (until),
hamma (upto, as much as), akka (like as), waa’ee (about, in regard to).
Some common postpositions are: ala (out, outside), bira (beside, with, around), booda (after), cinaa
(beside, near, next to), dur, dura (before), duuba (behind, back of), irra (on), irraa (from), itti (to,
24 | P a g e
at, in), jala (under, beneath), jidduu (middle, between), keessa (in, inside), malee (without, except),
wajjin (with, together), gubbaa (on, above), fuuldura (in front of), gad(i) (down, below), ol(i) (up,
above).
Afaan Oromo Conjunction
A conjunction is a word that can be used to connect two phrases, clauses and sentences.
Conjunctions can be divided into coordinating and subordinating conjunctions. Coordinating
conjunctions are used to connect two independent clauses [28], whereas, subordinating
conjunctions are used to connect main clauses with subordinate clauses [25]. Consider the
following examples:
Ittoo shiroon jaaladha garuu ittoo misira caalaa jaaladha (I like shiro watt, but I like lentil
watt more). ‘Garuu’ is used to connect the two independent sentences “Ittoo shiroon
jaaladha” and “ittoo misira caalaa jaaladha”.
Nyaatan barbaada sababiinsa nan beela’e. (I want food because I am hungry). ‘Sababiinsa’
is used as a subordinating conjunction. It connects the independent clause “Nyaatan
barbaada (I want food)” and the subordinating clause “nan beela’e (I am hungry)”.
Some common Afaan Oromo conjunctions are: fi (and), garuu/immoo (but), yookin-for
declaratives, moo-for questions (or), haa ta’u malee (however), etc.
Afaan Oromo Subordinating conjunctions are yoo (if), akka waan (as if), sababiin isaa, sababiinsa
(because), kanaafuu (so, therefore), akka (so that, in order to), ta’us (though), tu’ullee (even
though), wanta/yeenna (when), hamma (until), erga (after), dursa (before), etc.
Noun Phrases
A noun phrase is a phrase that has a noun or indefinite pronoun as its head. For example: in the
sentence, Manni Toolaan sun jige. [That Tola’s house has damaged], “Manni Toolaan” is a noun
phrase, and the head (noun) of the noun phrase is “Manni”.
25 | P a g e
Verb Phrases
In a verb phrase the word that the phrase about is the verb. For example: in the sentence, Caaltun
biddeena xaafii tolchite. [Chaltu made teff injera], ‘tolchite’ is the head of the verb phrase
“biddeena xaafii tolchite”. The verb phrase tells what Chaltu did.
Prepositional Phrases
A preposition links a noun to an action or to another noun. A prepositional phrase is a phrase that
has a preposition as its head. For example: in the sentence, Erga bokkaan caamee, gara magaalaa
deemne. [When the rain stops raining we went to the city], “gara magaalaa” is a prepositional
phrase and the head of the prepositional phrase is ‘gara’ [to].
Adjective Phrases
In an adjective phrase, one or more words work together to give more information about the
adjective. For example: in the sentence, Caaltun barnoota ishiitiin daran cimtuudha. [Chaltu is
very cleaver in her education.], the phrase “barnoota ishiitiin daran cimtuudha” is adjectival
phrase.
Adverbial Phrases
Adverbs may modify the manner of an action, indicate the time of an action, give location or
indicate degree. Consider the following Afaan Oromo adverbial phrases:
- Mucaan suutaan deema [The boy went slowly]; suutaan indicates the manner of an action.
- Abbaan isaa darbannii darbanii mana dhufu. [His father come to home seldom.]; darbannii
darbanii indicates the time of an action.
- Dabtara keessan bakka kana ka’aadha deemaa. [Put your exercise book here and go.];
bakka kana indicates location.
- Obbo Caalaan lafa ballinaa qotan [Mr. Chala is farming a large land]; ballinaa indicates
degree.
- Inni hojii suutaan hojjechu filata. [He prefers to do his work quickly.], suutaan indicates
the manner of an action.
Subject-Verb agreement
Like Amharic, Afaan Oromo verbs agree with their subjects. The person, number and gender of
the subject of the verb are marked by suffixes or prefixes on the verb [32].
For example:
Definiteness
Afaan Oromo has no indefinite articles but it indicates definiteness with suffixes ‘-(t)icha’ for
masculine nouns and ‘-(t)ittii’ for feminine nouns and the last vowel of the noun is dropped before
suffixes (-icha, -ittii, -attii, -utti) are added [23, 26].
For example:
karaa ‘road’, karaa + icha (karicha) (the road),
nama ‘man’, nama + (t)icha (namicha /namticha/) (the man).
For animated nouns that take either male or female gender, the definite suffix may indicate the
intended gender. For example: qaalluu (priest), qaalicha (the priest, masculine), qallittii (the priest,
feminine).
27 | P a g e
Inni bishaan fide (He brought water), Inni [he] is the subject, bishaan (water) is the object
and fide (brought) is the verb.
Isheen hoolaa bitte (She bought sheep), Isheen (she) is the subject, hoolaa (sheep) is the
object and bitte (bought) is the verb.
28 | P a g e
Figure 2.2: Vauquois Triangle [38]
The most intuitive form of translation is simply translating every word, one by one, looking up the
word in a bilingual lexicon. This is also the basis of the so called direct translation approach, found
at the bottom of the Vauquois tringle [38]. One level above the direct approach is the transfer
approach. In syntactic transfer, the syntax structure of the source sentence is analyzed, and the
resulting syntactic structure is mapped, by rules, to a new syntactic structure in the target language.
Semantic transfer, is similar to syntactic transfer, but attempts to analyse the semantic structure of
the source sentence, and uses rules to map these to a semantic structure in the target language. The
interlingua model is found at the top of the Vauquois triangle. Direct and transfer approaches rely
extensively on various sets of rules that map words, syntax, or semantic roles from the source
language to the target language. This is a limitation, when there are multiple languages to relate to
each other, because it requires to reconstruct the rule sets for each language pair. The interlingua
approach is a solution to the limitation of direct and transfer approach. The basic idea behind
interlingua approach is, instead of translating from all languages to all others, translation goes from
the source languages to one interlingua representation and from that representation to the target
languages. Each of RBMT approaches are discussed below.
29 | P a g e
2.4.1.1 Direct Machine Translation Approach
Direct machine translation (DMT) approach is the oldest and less popular approach. Machine
translation systems that use this approach are capable of translating a source language directly to
a target language. Words of the source language are translated into target language with the same
word-for-word arrangement with the help of bilingual dictionary, without passing through an
intermediary representation [37]. Source language analysis is oriented specifically to only one
target language. Direct machine translation systems are basically unidirectional and bilingual.
As depicted in Figure 2.3, DMT approach requires the following stages for the generation of a
sentence in the target language.
The morphological inflections are removed from the words of the source text according to
the different grammar rules of the word.
30 | P a g e
adjustments on word order and morphology. DMT involves only lexical analysis, i.e., it does not
consider structure and relationships between words and also it is developed for a specific language
pair are among the limitations [35].
31 | P a g e
Because of its independency on the language pair for translation, this approach is useful for
multilingual machine translation system.
The language model calculates the probability of the target language 𝑝(𝑡) and it models the
fluency of the proposed target sentence.
32 | P a g e
Basically, an N-gram model predicts the occurrence of a word based on the occurrence of its N–1
previous words. For example, a bigram model (when N = 2) predicts the occurrence of a word,
given only its previous word. Similarly, a trigram model (when N = 3) predicts the occurrence of
a word based on its previous two words.
The Maximum Likelihood Estimate (MLE) of the unigram probability of a word 𝑤𝑖 in a corpus is
its count 𝑐(𝑤𝑖) normalized by the total number of word tokens N, as given by equation (1):
𝐶(𝑤𝑖 )
𝑝(𝑤𝑖 ) = (1)
𝑁
To compute a particular bigram probability of a word 𝑤𝑛 , given a previous word 𝑤𝑛−1 , we will
compute the count of the bigram C(𝑤𝑛−1 𝑤𝑛 ) normalized by the sum of all the bigrams that share
We can simplify equation (2) into equation (3), since the sum of all bigram counts that start with
a given word 𝑤𝑛−1 must be the unigram count for that word 𝑤𝑛−1
𝐶(𝑤𝑛−1 𝑤𝑛)
𝑝(𝑤𝑛 |𝑤𝑛−1 ) = (3)
𝐶(𝑤𝑛−1 )
To compute some of the n-gram probabilities, consider the following mini-corpus of five Amharic
sentences:
ሃና መፅሐፍ ገዛች
ሃና መፅሐፍ አነበበች
ሃና ሻይ አፈላች
አልማዝ ቡና ገዛች
አልማዝ ቡና ጠጣች
𝐶(ሃና) 3
𝑝(ሃና) = = = 0.20
𝑁 15
33 | P a g e
𝐶(አልማዝ) 2
𝑝(አልማዝ) = = = 0.13
𝑁 15
where N is the total number of words seen in the corpus.
Some of the bigram probabilities from the corpus are:
𝐶(ሃና መፅሐፍ) 2
𝑝(መፅሐፍ|ሃና) = = = 0.67
𝐶(ሃና) 3
𝐶(ሃና ሻይ) 1
𝑝(ሻይ|ሃና) = = = 0.33
𝐶(ሃና) 3
Some of the trigram probabilities from the corpus are:
𝐶(ሃና ሻይ አፈላች) 1
𝑝(አፈላች|ሃና ሻይ ) = = = 1.0
𝐶(ሃና ሻይ) 1
𝐶(አልማዝ ቡና ጠጣች ) 1
𝑝(ጠጣች|አልማዝ ቡና ) = = = 0.50
𝐶(አልማዝ ቡና) 2
The N-gram model performs well for unigram, bigram and trigram models for the corpus of simple
sentences. Long sentences are difficult to observe in corpora and if any N-gram is missing, the
language model will assign a probability of zero [43]. To keep a language model from assigning
zero probability, smoothing techniques are used. Laplace smoothing adds one to all the counts,
before we normalize them into probabilities. Since there are V words in a vocabulary and each one
was incremented, we also need to adjust the denominator to take into account the extra V
observations. Laplace smoothing to unigram probabilities is given by equation (4):
𝐶(𝑤𝑖 )+1
𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 (𝑤𝑖 ) = (4)
𝑁+𝑉
𝐶(ሃና) + 1 3+1 4 1
𝑃𝐿𝑎𝑝𝑙𝑎𝑐𝑒 (ሃና) = = = =
𝑁+𝑉 15 + 9 24 6
34 | P a g e
Laplace for smoothing of bigram
SMT system is not tailored to any specific pair of languages and it requires less virtual space than
other models of machine translation, which makes it easier to operate and train on smaller systems.
SMT does not work well for language pairs that have significantly different word orders and corpus
development can be costly. SMT approach is subdivided into the following approaches namely:
word based SMT, phrase based SMT, syntax based translation and hierarchical based SMT.
In word based SMT, sentences are broken down to the fundamental unit (word) and translation for
source language to target language is done word by word. Once the target words are generated
then they are arranged in a specific order by use of a reordering algorithm to generate the target
sentence. However, compound words like idioms bring complexities [44].
Phrase based SMT proposed by Koehn [45] and mainly uses phrases as the fundamental unit of
translation. The source and target language sentences contained in the parallel corpora are divided
into phrases. Phrase based translation models are acquired from a word-aligned parallel corpus by
extracting all phrase-pairs that are consistent with the word alignment based on Koehn [45]
principle. The input and output phrases are aligned according to a specific order as suggested by
Antony [44]. Though phrase based SMT may result in better performance, long phrases may
degrade the performance.
35 | P a g e
Syntax based translation is based on the idea of translating syntactic units, rather than single words
or strings of words (as in Phrase based SMT), i.e., parse trees of sentences/utterances.
Hierarchical phrase based SMT was proposed by Chiang [40] and combines the strengths of
phrase-based and syntax-based translations. Phrase-based consists of the unit of block or segment
of translation while the syntax based translation brings the rules of translation.
36 | P a g e
Figure 2.6: The Vauquois Triangle Modified for EBMT [47]
EBMT approach avoids the need for manually derived rules. However, it requires analysis and
generation modules to produce the dependency trees needed for the examples database and for
analyzing the sentence. EBMT is also computational efficiency, especially for large databases,
although parallel computation techniques can be applied.
37 | P a g e
and that has shown an encouraging results [48]. Unlike the traditional statistical machine
translation, the neural machine translation aims at building a single neural network that can be
jointly tuned to maximize the translation performance [49]. The models proposed recently for
neural machine translation often belong to a family of encoder-decoders. Recurrent Neural
Network (RNN) encoder-decoder is proposed by K. Cho et al. [50]. The main idea is that the
encoder encodes the source sentence with the RNN model and uses the last hidden state as input
for the RNN decoder; this represents the output in the target sentence. The endoder and decoder
are implemented using a RNN, especially Long-Short Term Memory (LSTM), convolutional
neural networks, self-attention units or a combination of them [51]. In all these architectures,
source and target sentences are handled separately as a one-dimensional sequence over time. One
of the weaknesses of such models is that the encoder states are computed only once at the
beginning and are left untouched with respect to the target histories.
If compared with SMT, there is no separate language model, translation model or reordering
model, but just a single sequence model which predicts one word at a time. The prediction is
conditioned on the source sentence and the already produced sequence in the target language. The
prediction power of NMT is more promising than that of SMT, as neural networks share statistical
evidence between similar words.
Although effective, the NMT systems still suffer some issues, such as scaling to larger
vocabularies of words and the slow speed of training the models. In addition, large corpus is needed
to train neural machine translation systems with performance comparable to statistical machine
translation.
38 | P a g e
costs and time consumption. Therefore, automatic metrics have been used in the evaluation of
machine translated text. Some of automatic evaluation metrics are:
39 | P a g e
into account. The PER is always lower than or equal to the WER. The shortcoming of the PER is
that the word order can be important in some cases. Therefore the best solution is to calculate both
word error rates.
BLEU Score
BLEU is an algorithm for evaluating the quality of text which has been machine-translated from
one natural language to another [55]. BLEU measures how many word sequences in the sentence
under evaluation match the word sequences of some reference sentence. BLEU could be gamed
by producing very short system outputs consisting only of highly confident n-grams, if it were not
for the use of brevity penalty which penalized the BLEU score if the system output is shorter than
the references.
∑𝑐∈{𝐶𝑎𝑛} ∑𝑛−𝑔𝑟𝑎𝑚∈𝑐 𝐶𝑛𝑡𝑐𝑙𝑖𝑝 (𝑛 − 𝑔𝑟𝑎𝑚)
𝑝𝑛 = (9)
∑𝑐 𝑟 ∈{𝐶𝑎𝑛} ∑𝑛−𝑔𝑟𝑎𝑚𝑟 ∈𝑐 𝑟 𝐶𝑛𝑡𝑐𝑙𝑖𝑝 (𝑛 − 𝑔𝑟𝑎𝑚𝑟 )
Equation (9) shows the computation of the BLEU precision score for n-grams of length n, where
Can are the sentences in the test-corpus, 𝐶𝑛𝑡(𝑛 − 𝑔𝑟𝑎𝑚) is the number of times an n-gram occurs
in a candidate, and 𝐶𝑛𝑡𝑐𝑙𝑖𝑝 (𝑛 − 𝑔𝑟𝑎𝑚) is the minimum of the unclipped count and the maximum
number of times it occurs in a reference translation.
1, 𝑐>𝑟
𝐵𝑃 = { 𝑟 (10)
𝑒 (1 − 𝑐 ) , 𝑐≤𝑟
Equation (10) shows the calculation of the BLEU brevity penalty, where c is the length of the
candidate translation and r is the length of the reference translation. These terms are combined, as
shown in Equation (11) to calculate the total BLEU score, wheren N is typically 4, and 𝑊𝑛 is
1
usually set to .
𝑁
𝑁
𝐵𝐿𝐸𝑈 = 𝐵𝑃. exp(∑ 𝑤𝑛 log 𝑝𝑛 ) (11)
𝑛=1
BLEU’s score is always a number between 0 and 1. This value indicates how similar the candidate
text is to the reference text, with values closer to 1 representing more similar texts. BLEU score
alson includes a penalty for translations whose length differs significantly from that of the
reference translation.
NIST
40 | P a g e
NIST metric is a method for evaluating the quality of text which has been translated using machine
translation. Its name comes from the US National Institute of Standards and Technology. NIST is
based on BLEU metric but introduced some modifications. BLEU calculates n-gram precision
adding equal weight to each one but NIST gives information weight for each word, i.e. higher
scores to more rate n-gram which are considered as more informative n-grams. NIST differs also
from BLEU in brevity penality calculation, where small differences in translation length do not
impact the overall score.
METEOR
Metric for Evaluation of Translation with Explicity Ordering (METEOR) is an automatic
evaluation metric for machine translation output [56]. METEOR modifies BLEU in the way that
it gives more emphasis to recall than to precision. METEOR was designed to fix some of the
problems found in the more popular BLEU metric, and also produce good correlation with human
judgement at the sentence or segment level. This differs from the BLEU metric in that BLEU seeks
correlation at the corpus level.
Unlike BLEU which only calculates precision, METEOR calculates both precision and recall, and
combine the two as shown in equation (12).
𝑃. 𝑅
𝐹𝑚𝑒𝑎𝑛 = (12)
𝛼𝑃 + (1 − 𝛼)𝑅
METEOR uses several stages of word matching between the system output and the reference
translations in order to align the two strings. The matching stages are as follows:
a) Exact matching: strings which are identical in the reference and the hypothesis are aligned.
b) Stem matching: stemming is performed, so that words with the same morphological root
are aligned.
c) Synonymy matching: words which are synonyms according to wordnet are aligned.
In each of these stages only words that were not matched in previous stages are allowed to be
matched. Only unigrams, single words are compared for matches. Precision in METEOR is
defined as the number of matches divided by the number of words in the system output and recall
is defined as the number of matches divided by the number of words in the reference.
41 | P a g e
CHAPTER THREE: RELATED WORK
3.1 Overview
This chapter reviews the literature on machine translation done on different language pairs. This
review covers machine translation system done for non-Ethiopian language pairs, for English and
Ethiopian languages pairs and for Ethiopian language pairs. Finally a brief summary of this chapter
is given.
42 | P a g e
3.3 Machine Translation Systems for English and Ethiopian language pairs
The research which was conducted by Jabesa Daba [7] mainly deals with English-Afaan Oromo
machine translation system using a hybrid of rule-based and statistical approaches. Since English
and Afaan Oromo have different sentence structures, the author implemented syntactic reordering
with the purpose of making the structure of source sentences similar to the structure of target
sentences. Accordingly, reordering rules are developed for simple, interrogative and complex
English and Afaan Oromo sentences. Two groups of experiments are conducted by using purely
statistical approach and hybrid approach. The Afaan Oromo-English SMT yields a BLEU score of
41.50% whereas English-Afaan Oromo SMT has a BLEU score of 32.39%. After applying local
reordering rules, the system is improved to provide a BLEU score of 52.02% and 37.41% for Afaan
Oromo-English and English-Afaan Oromo translations, respectively. The limitation of the study
is that, the rules developed are used only for syntax reordering; morphological rules are not
included.
The study which was conducted by Sisay Adugna [8] mainly deals with the translation of English
documents to Afaan Oromo using statistical methods. The study was carried out with two main
goals: the first one is to apply existing SMT system on English – Afaan Oromo language pair by
using available parallel corpus and the second one is to identify the challenges that need a solution
regarding the language pair. The author used parallel documents from different domains including
spiritual, medical and legal documents. 20,000 bilingual sentences and 62,300 monolingual
sentences were used for training and testing purpose. The BLEU score for the test data from legal,
medical and religious domains are 13.69%, 1.97% and 21.72% respectively. Due to the spelling
error of same Afaan Oromo words in the corpus, the system consider them as different. The
limitation of the study is that it does not incorporate Afaan Oromo spell checker.
The study which was conducted by Eleni Teshome [6] mainly deals with translation of English
documents to Amharic and Amharic documents to English. The research work implemented the
statistical machine translation approach. Two language models were developed, one for Amharic
and the other for English so as to ensure a bidirectional translation. Translation models were built
which assign a probability that a given source language text generates a target language text. Two
different corpora were prepared. Corpus I was made of about 1020 simple sentences that had been
prepared manually. All sentences were used for the training set. For the test set, the sample text
43 | P a g e
that contains 102 simple sentences was prepared manually. Corpus II contains 1951 complex
sentences out of which 40 sentences were used for the test set. Two methodologies were used to
test the system. The first methodology is BLEU score and the second methodology used is
preparing a questionnaire manually. The result on Corpus I recorded from the first methodology
(BLEU Score) was 82.22% for the English to Amharic translation and 90.59% for the Amharic to
English translation. The result recorded on Corpus I using the second methodology was 91% for
the English-Amharic translation and 97% for the Amharic to English. The result on Corpus II
recorded from the first methodology was 73.38% for English to Amharic translation and 84.12%
for Amharic to English translation. The accuracy from the second methodology on Corpus II was
87% for English to Amharic translation and 89% for Amharic to English translation. The limitation
of the study is that it does not handle larger set of complex sentences.
3.5 Summary
In this section, we have discussed works related to machine translation for different language pairs.
As to the researcher’s knowledge there is no study conducted that deal with Amharic-Afaan Oromo
machine translation. Since Amharic and Afaan Oromo are morphologically rich and less resourced
languages and researches conducted on machine translation for different language pairs using
different approaches cannot be directly applied for Amharic-Afaan Oromo or vice versa
translation, this study deal to experiment bidirectional Amharic-Afaan Oromo machine translation
using hybrid approach.
44 | P a g e
CHAPTER FOUR: BIDIRECTIONAL AMHARIC-AFAAN OROMO
MACHINE TRANSLATION SYSTEM
4.1 Introduction
This chapter discusses bidirectional Amharic – Afaan Oromo machine translation system. The
overall system architecture and its components are discussed in detail.
45 | P a g e
Amharic/Afaan Oromo sentences consist of lexical items called Part of Speech (POS). POS
tagging is the process by which a specific tag is assigned to each word of an input sentence, to
indicate the function of that word in the specific context. POS includes nouns, verbs, adjectives,
adverbs, pronouns, conjunctions and their sub-categories.
Table 4.1 shows POS tag sets used in Amharic and Afaan Oromo in which most of them are
adopted from the English Penn Treebank tag sets [7] and tag sets developed for Amharic-Tigrigna
translation [9].
Table 4.1: Amharic and Afaan Oromo POS tag sets
No Tags Description
3 CD Cardinal number
5 ON Ordinary number
6 IN Preposition
7 JJ Adjective
12 NP Noun phrase
15 PUN Punctuation
46 | P a g e
16 RB Adverb
17 SYM Symbol
25 WP Interrogative pronoun
Since there are no publicly available POS tagger tools for Amharic and Afaan Oromo, for this
research, POS tagging is done manually. Input sentences in either languages (Amharic or Afaan
Oromo) are POS tagged for sentence reordering. According to the sentence structure of Amharic
or Afaan Oromo languages, the words and their tagged information are stored in a separate file.
47 | P a g e
Amharic POS tag: ሃና_NNP መኪና_NN ገዛች_VBD ።_PUN
‘konkoolaataa’, ‘ወተት’, ‘aannan’) and the verbs (‘ገዛች’, ‘bitte’, ‘ጠጣ’, ‘dhuge’) come after the
objects. Therefore, for such kinds of simple sentences reordering rules are not required.
Consider the following Amharic sentences containing prepositions and their translations in Afaan
Oromo.
Amharic: እሱ ወንበር ላይ ተቀመጠ ። /xsu wänbär lay täqämäTä/ [He sit on the chair).
Amharic: እኛ ወደ ከተማ ሄድን ። /xNa wädä kätäma hedn/ [We went to the city).
When the prepositions ‘ላይ’ and ‘ውስጥ’ appear after the nouns ‘ወንበር’ and ‘ቤት’ in Amharic
sentences then the equivalent prepositions ‘irraa’ and ‘keesaa’ in Afaan Oromo also appear after
48 | P a g e
the nouns ‘barcumaa’ and ‘mana’, and also when the preposition ‘ወደ’ appears before the noun
‘ከተማ’ in Amharic sentence then the equivalent preposition ‘gara’ in Afaan Oromo sentence
appears before the noun ‘magaalaa’. Therefore, for such kinds of Amharic and Afaan Oromo
sentences containing prepositions, reordering rules are not required.
Consider the following Amharic interrogative sentences and their translations in Afaan Oromo.
Amharic: ካሳ መቼ መጣህ? /kasa mäce mäTah?/ [Kassa when did you come?]
As shown in the Amharic interrogative sentences, when the interrogative pronouns (መቼ, ምን)
come before the verbs (መጣህ, ፈለግሽ) then the interrogative pronouns (yoom, maal) in Afaan
Oromo also come before the verbs (dhufte, barbaadee).
When the interrogative pronoun ‘ማን’ comes after the verb ‘የገዛው’ in the Amharic interrogative
sentence, then the interrogative pronoun ‘eenyu’ also comes after the verb ‘kan bitte’ in the Afaan
Oromo interrogative sentence. Therefore, for such kinds of Amharic and Afaan Oromo
interrogative sentences reordering rule is not required.
In Amharic to Afaan Oromo translation, Amharic reordering rules are used to make Amharic
sentences in the corpus to have a similar sentence structure with that of Afaan Oromo and vice
versa. In this section, Amharic/Afaan Oromo reordering rules are discussed, which are used to
perform syntactic reordering on Amharic/Afaan Oromo words in the sentence.
49 | P a g e
Reordering Rule for compound word
A compound word (CW) is a combination of two words that can be treated as a single word in a
sentence. Consider the following Amharic compound word and its Afaan Oromo translation:
The head words ዳቦ /dabo/, ጠጅ /Täj/ and ጠላ /Täla/ are nouns. The noun phrases የስንዴ /yäsnde/
and የማር /yämar/ and የገብስ /yägäbs/ indicate from what the head words ዳቦ /dabo/, ጠጅ /Täj/
51 | P a g e
ፀሀይ /tsehay/, ኮከብ /kokeb/ and መሬት / märet/ are nouns which are used as the head word. The
noun phrase የበጋ / yäbäga/ indicates the wheather condition of the sun, የሰሜን /yäsämen/ indicates
the direction of the ኮከብ /kokeb/ and የእርሻ / yäxrśa / indicates the usage of the land.
Consider the following Amharic noun phrases and their translation in Afaan Oromo.
The noun phrases የካሳ /yäkasa/ and የአስቴር /yäxäster/ indicate the owner of the book /መፅሃፍ/
From the above Amharic and Afaan Oromo noun phrases discussion, the structure for the above
Amharic noun phrases is NP => NP NN and the structure for the above Afaan Oromo noun phrases
is NP = > NN NP. From the above discussion, Amharic noun phrases have different structure from
Afaan Oromo noun phrases. In order to have a similar structure in both Amharic and Afaan Oromo
sentences, we apply the reordering rule defined by the Algorithm 4.2 to the Amharic/Afaan Oromo
sentences.
Now consider the following example of Amharic sentence and its translation in Afaan Oromo
where the noun phrase is used as a direct object.
Amharic: ገመቹ የድንጋይ ቤት ሰራ ። /gämäcu yädngay bet sera/ [Gemechu made a house
from stone.]
Afaan Oromo: Gammachuun mana dhagaa ijaare.
The noun phrases ‘የድንጋይ ቤት’ and ‘mana dhagaa’ are used as a direct object in the Amharic and
Afaan Oromo sentences respectively. They have different order. In order to have a similar structure
in both Amharic and Afaan Oromo sentences, we apply the reordering rule defined by Algorithm
4.2 to the Amharic/Afaan Oromo sentences that have a noun phrase used as a direct object.
52 | P a g e
Algorithm 4.2: Algorithm for reordering noun phrases
Amharic: ቢጫው መኪና የሚሸጥ ነው ። /biCaw mäkina yämiśät näw/ [The yellow car is for sale.]
Afaan Oromo POS tagged: Konkoolaataa_NN booran_JJ kan gurguramuu dha ._PUN
In the Amharic sentence, the noun phrase ‘ቢጫው መኪና’ is used as a subject of the sentence and
the noun adjective ‘ቢጫው’ appears before the noun ‘መኪና’ whereas in Afaan Oromo sentence,
the noun adjective ‘booran’ appears after the noun ‘konkoolaataa’. In order to have a similar
53 | P a g e
structure in both Amharic and Afaan Oromo sentences, we apply the reordering rule defined by
the Algorithm 4.3 to the Amharic/Afaan Oromo sentences that contain adjectives.
Reordering Rule for sentences containing a noun phrase and a compound word
Consider the following Amharic sentence containing a noun phrase and a compound word and
Afaan Oromo translation, where the noun phrases ‘የቂሊንጦ’ and ‘Qilinxoon’ are used with the
54 | P a g e
The Amharic sentence that contains the noun phrase ‘የቂሊንጦ’ and the compound word ‘ማረሚያ
ቤት’ has different word order compared to its equivalent Afaan Oromo translated sentence.
In order to have a similar sentence structure in both Amharic and Afaan Oromo sentences that
contain a noun phrase and a compound word, we apply the reordering rule defined by the
Algorithm 4.4 to the Amharic sentence and Algorithm 4.5 to the Afaan Oromo sentence
respectively.
Algorithm 4.4: Algorithm for reordering Amharic sentences containing a noun phrase and a
compound word
55 | P a g e
Algorithm 4.5: Algorithm for reordering Afaan Oromo sentences containing a noun phrase and a
compound word
Reordering Rule for sentences containing an adjective and a compound word
Amharic noun phrase could be constructed from an adjective followed by a compound word and
in Afaan Oromo a noun phrase could be a compound word followed by an adjective.
Consider the following Amharic sentence containing a noun phrase modified by the adjective and
its equivalent Afaan Oromo translation.
56 | P a g e
The Amharic compound word ‘ፍርድ ቤት’ that is modified by the adjective ‘ከፍተኛ’ has different
word order compared to Afaan Oromo compound word ‘Mana murtii’ modified by the the
adjective ‘olaanaa’. In order to have a similar structure in both Amharic and Afaan Oromo noun
phrases, we apply the reordering rule defined by the Algorithm 4.6 to Amharic and Algorithm 4.7
to Afaan Oromo noun phrases that contain a compound word modified by an adjective
respectively.
Algorithm 4.6: Algorithm for reordering Amahric sentences containing an adjective and a
compound word
57 | P a g e
Algorithm 4.7: Algorithm for reordering Afaan Oromo sentences containing an adjective and a
compound word
Amharic: የእሱ ላሞች ሳር እየጋጡ ነው ። /yäxsu lamoc sar xyägaTu näw/ [His cows are gr azing
grass].
Afaan Oromo: Saawwan isa margaa nyacha jiran.
58 | P a g e
Afaan Oromo POS tagged: Saawwan_NNS isa_PRP$ margaa_NN nyacha_VBG jiran_AUX
._PUN
In the Amharic sentence, the plural noun ‘ላሞች’ comes after the possessive pronoun ‘የእሱ’ but in
the Afaan Oromo sentence the possessive pronoun ‘Taaddasa’ comes after the plural noun
‘saawwan’. In order to have a similar structure in both Amharic and Afaan Oromo sentences, we
apply the reordering rule defined by the Algorithm 4.8 to the Amharic/Afaan Oromo sentences
containing possessive pronouns.
59 | P a g e
Amharic: እኔ ሶስት ቋንቋዎችን እናገራለው ። /xne sost qWanqWawocn xnagäralähu/
Amharic: የእኔ ልጅ ሁለት ድመቶች አሉት ።/yäxne lj hulät dmätoc alut/[My son has two cats].
In the Amharic sentences, cardinal numbers ‘ሶስት’ and ‘ሁለት’ are placed before nouns
‘ቋንቋዎችን’ and ‘ድመቶች’ whereas in Afaan Oromo cardinal numbers ‘sadi’ and ‘lama’ are placed
after nouns ‘Afaawwan’ and ‘adurreewwan’ respectively. The reordering of Amharic and Afaan
Oromo sentences that contain cardinal numbers is done by Algorithm 4.9
60 | P a g e
Algorithm 4.9: Algorithm for reordering cardinal numbers
61 | P a g e
In the Amharic sentence, the ordinary number ‘ሶስተኛውን’ is placed before the noun ‘መፅሐፍ’
whereas in the Afaan Oromo sentence, the ordinary number ‘saddaffaa’ is placed after the noun
‘kitaabicha’. Algorithm 4.10 shows the reordering of ordinary numbers in both languages.
Amharic: አዲስ የቤት መኪና /xädis yäbet mäkina/ [New house car]
62 | P a g e
In the Amharic noun phrase, the adjective ‘አዲስ’ comes before the noun phrase ‘የቤት መኪና’ but
in the Afaan Oromo sentence the adjective ‘haaraa’ comes after the noun phrase ‘konkoolaataa
mana’. In order to have a similar structure in both Amharic and Afaan Oromo noun phrases, we
apply the reordering rule defined by the Algorithm 4.11 to the Amharic/Afaan Oromo noun
phrases.
63 | P a g e
Reordering Rule for sentences containing possessive pronoun and a noun phrase
Consider the following Amharic noun phrase and its translation in Afaan Oromo.
Amharic: የአልማዝ የወርቅ ቀለበት /yäxälmaz yäwärq qäläbät/ [Almaz’s gold ring.]
In the Amharic noun phrase, ‘የአልማዝ’ is used as a possessive pronoun i.e., it is described as the
owner of the propery described by the noun phrase ‘የወርቅ ቀለበት’. ‘የአልማዝ’ comes before the
noun phrase ‘የወርቅ ቀለበት’ but in the Afaan Oromo sentence the owner of the property ‘Amartii
waqee’ comes after it. In order to have a similar structure in both Amharic and Afaan Oromo noun
phrases, we apply the reordering rule defined by the Algorithm 4.12 to the Amharic/Afaan Oromo
sentences containing a possessive pronoun and a noun phrase.
64 | P a g e
Algorithm 4.12: Reordering rule for sentences containing a possessive pronoun and a noun
phrase
Reordering Rule for sentences containing a cardinal number and a noun phrase
Consider the following Amharic sentence containing a cardinal number and a noun phrase and its
translation in Afaan Oromo.
65 | P a g e
In the Amharic phrase, the cardinal number ‘5’ comes before the noun phrase ‘የጃፓን መኪናዎች’
and in the Afaan Oromo, the noun phrase ‘Konkoolaataawan Jaapaan’ comes before the cardinal
number ‘5’.
In order to have a similar structure in both Amharic and Afaan Oromo sentences, we apply the
reordering rule defined by the Algorithm 4.13 to the Amharic/Afaan Oromo sentences containing
a cardinal number and a noun phrase.
Algorithm 4.13: Reordering rule for sentences containing a cardinal number and a noun phrase
Reordering Rule for sentences containing an ordinary number and a noun combination
Consider the following Amharic sentence containing an ordinary number and a noun phrase and
its translation in Afaan Oromo.
66 | P a g e
Amharic: 2ተኛው ዙር ውድድር /2täNaw zur wddr/ [The 2nd round tournament].
In the Amharic the noun phrase ‘2ተኛው ዙር’ containing the noun phrase ‘2ተኛው’ comes before
the noun ‘ውድድር’ and in the Afaan Oromo, the noun phrase ‘marsaan 2ffaan’ containing the
ordinary number ‘2ffaan’ comes after the noun ‘Waldorgommin’.
Algorithm 4.14: Reordering rule for sentences containing an ordinary number and noun
combination
67 | P a g e
In order to have a similar structure in both Amharic and Afaan Oromo phrases, we apply the
reordering rule defined by the Algorithm 4.14 to the Amharic/Afaan Oromo sentences containing
an ordinary number and a noun combination.
Similarly, when the translation is from Afaan Oromo to Amharic, the translation model probability
𝑝(𝑜|𝑎) is used to measure the quality of the translation of source Afaan Oromo sentence o to the
given target Amharic sentence a. The translation model finds out the correspondence between the
source sentence and the target sentence in the source/target parallel corpus, which is called word-
alignment. The basic unit of the correspondence is word. The alignment between the source word
68 | P a g e
and the target word could be one-to-zero, one-to-one or one-to-many. The translation system can
produce multiple words from a single word, but not vice versa and this is a limitation of word-
based model. One of the ways to overcome this limitation is to use phrase-based translation. The
basis of phrase-based translation is to fragment the input sentence into phrases (sequence of
consecutive words), translate and reorder these phrases into the target language. The phrase-based
translation process is broken up into the following three mapping steps as shown in Figure 4.2.
4.2.4 Decoding
The decoder’s task is aimed to find the best translation in the target language for a given input
sentence by the statistical methods that count on the translation model and the language model.
When translation is from Amharic to Afaan Oromo, the best translation is the one that maximizes
the product of the probabilities 𝑝(𝑎|𝑜) 𝑎𝑛𝑑 𝑝(𝑜), i.e., 𝑎𝑟𝑔max 𝑝(𝑎|𝑜) ∗ 𝑝(𝑜).
𝑜
Similarly, when translation is from Afaan Oromo to Amharic, the best translation is the one that
maximizes the product of the probabilities 𝑝(𝑜|𝑎) 𝑎𝑛𝑑 𝑝(𝑎), i.e., 𝑎𝑟𝑔max 𝑝(𝑜|𝑎) ∗ 𝑝(𝑎).
𝑎
69 | P a g e
CHAPTER FIVE: EXPERIMENT AND DISCUSSION
5.1 Introduction
Based on the design of Chapter Four, Amharic-Afaan Oromo bidirectional machine translation is
experimented using a hybrid approach. This Chapter evaluates its performance by conducting two
experiments by using a statistical approach and a hybrid approach.
5.3 Experiment I
The first two experiments, i.e., Amharic to Afaan Oromo translation and vice versa, were
conducted by using a statistical approach.
2
http://www.fanabc.com
70 | P a g e
Language Model Training
The language model is used to ensure fluent output. Since the translation is bidirectional, the
language model was built with Amharic as a target language for Afaan Oromo to Amharic
translation and Afaan Oromo as a target language for Amharic to Afaan Oromo translation.
IRSTLM toolkit was used to perform language modeling task. An appropriate 3-gram language
model was built. First, the training was performed for Amharic to Afaan Oromo and then for Afaan
Oromo to Amharic.
Training the Translation System
To train the translation model, we run word-alignment using GIZA++, phrase extraction and
scoring, create lexicalized reordering tables and create Moses configuration file. The model
specified by moses.ini file is used to decode/translate sentences from Amharic to Afaan Oromo
and vice versa. The phrase table and reodering table were binarised, i.e., compiling them into a
format that can be loaded quickly.
Tuning
Weights used by Moses to weight the different models against each other are not optimized. To
find better weights we need to tune the translation system. Tuning requires a small amount of
parallel data separate from the training data. Therefore, the parallel data was passed through
tokenization and truecasing processes. The end result of tuning is an “.ini” file with trained
weights.
5.4 Experiment II
Two experiments were conducted on Amharic-Afaan Oromo language pair by using a hybrid
approach.
71 | P a g e
First, the sentence reordering rules mentioned in Chapter Four are applied on the training and test
sets, then a statistical approach is applied on the reordered corpus.
5.5 Discussion
When translating from Amharic sentences to Afaan Oromo, for example, “የአንተ ስም ማነው?” is
translated as “Maqaan kee eenyu?” but when translating Afaan Oromo sentence “Maqaan kee
eenyu?” to Amharic, it can be translated as “የአንተ ስም ማነው?” or “የአንቺ ስም ማነው?”. Similarly,
“እሱ ሻይ መጠጣት አይወድም” can be translated as “Inni shaayii dhugu hin jaalatu” but “Inni shaayii
dhugu hin jaalatu” can be translated as “እሱ ሻይ መጠጣት አይወድም” or “እሱ ሻይ መጠጣት
አትወድም”. These indicate Afaan Oromo words like “kee” and “hin jaalatu” can be translated in
Amharic as “የአንተ” or “የአንቺ” and “አይወድም” or “አትወድም” respectively. But both Amharic
words “አይወድም” and “አትወድም” are translated as “hin jaalatu” in Afaan Oromo. This means an
Amharic word can have more than one meaning/equivalent in Afaan Oromo. This might be the
reason behind the difference between the performances in Amharic to Afaan Oromo and Afaan
Oromo to Amharic in both the experiments.
The experiments are conducted by using two different approaches. From the results of the
experiments we can see that the result recorded from a BLEU score shows that the hybrid approach
is better than the statistical approach for Amharic-Afaan Oromo bidirectional machine translation.
72 | P a g e
CHAPTER SIX: CONCLUSION AND FUTURE WORK
6.1 Introduction
This chapter concludes the thesis and highlights the main contributions that were achieved based
on the stated objective. Finally, some suggestions and recommendations are made for future work
that could be done in similar area of research.
6.2 Conclusion
In this study, we have developed a bidirectional Amharic-Afaan Oromo machine translation
prototype using, hybrid approach. The system has four components: sentence reordering, language
model, decoding and translation model.
The sentence reordering is used to pre-process the structure of the source language to be more
similar to the structure of the target language by using their POS tagging and to better guide the
statistical engine. We have prepared manually tagged corpus for both Amharic and Afaan Oromo
languages since there are no publicly available POS tagger tools for both languages. The linguistic
background and nature of the two languages have been studied in order to design the reordering
rules for different types of Amharic/Afaan Oromo phrases and sentences. Language modeling,
translation modeling and decoding are all components of the statistical approach which are freely
available on the web and incorporated in the translation system. The language model estimates
how likely a string is in a given target language, Afaan Oromo or Amharic. A language model has
been developed for both Afaan Oromo and Amharic because the system is bidirectional. The
translation model is used to measure the quality of the translation of the source language sentence
given the target language sentence. Just like language models, two translation models were
developed one for Amharic and the other for Afaan Oromo. The decoder is used to find the best
translation in the target language (Amharic/Afaan Oromo) for a given source language (Afaan
Oromo/Amharic) based on the translation and language models.
Amharic-Afaan Oromo hybrid bidirectional machine translation design involves collection of
Amharic and Afaan Oromo parallel corpus, corpus preparation, POS tagging, implementing the
reordering rules for Amharic and Afaan Oromo sentences using ASP.Net C# programming and
SQL server 2014 as back end, language modeling by using IRSTLM tool, translation modeling by
73 | P a g e
using GIZA++ (for creating word alignment from the parallel corpus) and training the system by
using Moses.
Finally, two experiments were conducted by using the collected data set to check the accuracy of
the system using two different approaches. The first experiment is conducted by using a statistical
approach to translate Amharic to Afaan Oromo and vice versa and has a BLEU score of 89.39%
and 80.33% respectively. The second experiment is carried out by using a hybrid approach and
has a BLEU score of 91.56% and 82.24% for Amharic to Afaan Oromo and Afaan Oromo to
Amharic translation respectively. From the test results of the conducted experiments in this
research, it can be concluded that the hybrid approach is better than the statistical approach.
6.3 Contribution
The contribution of this study is to confirm that hybrid machine translation approach is a better
option to translate Amharic to Afaan Oromo and vice versa. This approach was capable of
translating different Amharic and Afaan Oromo phrases and simple sentences containing
compound words, adjectives, noun phrases, possessive pronounus, cardinal and ordinary numbers.
Additionally, the parallel corpus used for this study can be used as input for other similar
researches areas.
Better results may be obtained by increasing the size of the parallel corpus used for training
the system.
74 | P a g e
References
[1] Kituku, Benson, Lawrence Muchemi, and Wanjiku Nganga. “A Review on Machine
Translation Approaches,” TELKOMNIKA Indonesian Journal of Electrical Engineering
and Computer Science, Vol. 1, No. 1, 2016, pp 182-190.
[2] Amal Ganesh and Aasha V.C., “Rule Based Machine Translation: English to Malayalam:
A Servey” in Proceedings of 3rd International Conference on Advanced Computing,
Networking and Informatics, India, Orissa, October 2015, Vol 43, pp 447-454.
[3] Benjamin Elisha Sawe, “What Languages Are Spoken In Ethiopia,” retrieved from
https://www.workdatlas.com/articles/what-languages-are-spoken-in-ethiopia.html, Last
access on March 05, 2020.
[4] Joel Ilao, Jasmine Ang, Marc Randell Chan and Joyce Uy, “Filipino-to-English
Bidirectional Statistical Machine Translation Using Feedback”, retrieved from
https://www.researchgate.net/publication/280561598_FEBSMT_Filipino-to-
English_Bidirectional_Statistical_Machine_Translation_Using_Feedback, Last accessed
on September 25, 2018.
[5] Yin Yin Win, Aye Thida, “Myanmar-English Bidirectional Machine Translation System
with Numerical Particles Identification”, retrieved from http://www.mecs-
press.org/ijitcs/ijitcs-v8-n6/IJITCS-V8-N6-5.pdf, Last accessed on October 01, 2018.
[7] Jabesa Daba and Yaregal Assabie, “A Hybrid Approach to the Development of
Bidirectional English-Oromiffa Machine Translation” in Proceedings of the 9th
International Conference on NLP, Warsaw, Poland, September 2014.
75 | P a g e
[9] Akubazgi Gebremariam, “Amharic to Tigrigna Machine Translation using Hybrid
Approach: An Experiment Using a Statistical Approach”, Unpublished Masters Thesis,
Department of Computer Science, Addis Ababa University, Ethiopia, 2017.
[10] The Language Gulper: “An insatiable appetite for ancient and modern tongues”, retrieved
from, http://www.languagesgulper.com/eng/Amharic.html, Last accessed on February 04,
2020.
[12] ባየ ይማም, አጭርና ቀላል የአማርኛ ሰዋስው, አልፋ አታሚዎች, Addis Ababa, 2010.
[13] Abeba Ibrahim, “A Hybrid Approach to Amharic Base Phrase Chunking and Parsing”,
Unpublished Masters Thesis, Department of Computer Science, Addis Ababa University,
Ethiopia, 2013.
[14] Addis Ashagre, “Automatic Summarization for Amharic Text using Open Text
Summarizer”, Unpublished Masters Thesis, School of Information Science, Addis Ababa
University, Ethiopia, 2013.
[15] ጌታሁን አማረ, የአማርኛ ሰዋስው በቀላል አቀራረብ, የተሻሻለ ሁለተኛ ዕትም, አዲስ አበባ ዩኒቨርሲቲ
[16] Mohammed Hussen, “Part-of-speech tagging for Afaan Oromo language using
Transformational Error driven Learning (TEL) approach”, Unpublished Masters Thesis,
Department of Computer Science, Addis Ababa University, Ethiopia, 2010.
[17] John H.Spencer, Ethiopia at Bay: A Personal Account of the Haile Sellassie Years, Online
book uploaded 2017.
[18] Susan Russell, Amharic ግዕዝ TESL 539: Language Group Report Spring 2009, retrieved
from, http://www.ritell.org/Resources/Documents/language%20project/Amharic%20.pdf,
Last access on March 09, 2019.
76 | P a g e
[19] Ruth Kramer, “The Amharic Definite Marker and the Syntax-Morphology Interface”,
Annual Meeting of the Linguistic Society of Americal, University of California, Santa Cruz,
2008.
[20] Ibrahim Bedane, “The Origin of Afaan Oromo: Mother Language”, Global Journal of
human-social science, Vol. 15, Issue 12, 2015.
[21] Abdi Sani, “Afaan Oromo Named Entity Recognition Using Hybrid Approach”,
Unpublished Masters Thesis, Department of Computer Science, Addis Ababa University,
Ethiopia, 2015.
[22] Abebe Mideksa, “Statistical Afaan Oromo grammar checker”, Unpublished Masters
Thesis, School of Information Science, Addis Ababa University, Ethiopia, 2015.
[23] Gezehagn Gutema, “Afaan Oromo Text Retrieval System”, Unpublished Masters Thesis,
School of Information Science, Addis Ababa University, Ethiopia, 2012.
[25] Getachew Mamo and Million Meshesha, “Parts of Speech Tagging for Afaan Oromo”,
International Journal of Advanced Computer Science and Applications, Special Issue on
Artificial Intelligence, Vol. 1, No. 3, 2011.
[26] Kula Kekeba Tune, “Development of Cross-Lingual Information Retrieval for Resource-
Scarce African Languages”, Thesis submitted for the Degree of PhD in Computer Science
and Engineering, International Institute of Information Technology, Hyderabad, Deemed
University, India, 2015.
[27] Debela Tesfaye, “A rule-based Afaan Oromo Grammar Checker”, International Journal
of Advanced Computer Science and Applications, Vol. 2, No. 8, 2011, pp. 126 – 130.
[28] Abraham Gizaw, “Improving Brill’s tagger lexical and transformation rule for Afaan
Oromo Language”, Unpublished Masters Thesis, Department of Computer Science, Addis
Ababa University, Ethiopia, 2013.
77 | P a g e
[29] Assefa W/Mariam, “Development of Morphological Analyzer for Afaan Oromo”,
Unpublished Masters Thesis, Department of Information Science, Addis Ababa
University, Ethiopia, 2005.
[30] Debela Tesfaye, “Designing a Stemmer for Afaan Oromo Text: A hybrid approach”,
Unpublished Masters Thesis, Department of Information Science, Addis Ababa
University, Ethiopia, 2010.
[31] Fiseha Berhanu, “Afaan Oromo Automatic News Text Summarizer Based on Sentence
Selection Function”, Unpublished Masters Thesis, Department of Information Science,
Addis Ababa University, Ethiopia, 2013.
[32] Birhanu Demie, “The Impact of Afaan Oromo dialectal variations on teaching-learning
process of the language”, Unpublished Masters Thesis, Department of Linguistics and
Philology, Addis Ababa University, Ethiopia, 2010.
[34] Ekta Gupta, Shailendra Kumar Shrivastava. “A Result Analysis of Translation Techniques
of English to Hindi Online Translation Systems,” International Journal of Computer
Applications (0975 – 8887), Vol. 156, No. 12, 2016.
[36] V.C. Aasha and Amal Ganesh, “Rule Based Machine Translation: English to Malayalam:
A Survey” in Proceedings of the 3rd International Conference on Advanced Computing,
Networking and Informatics, India, January 2016.
78 | P a g e
[38] Sadik Bessou and Mohamed Touahria, “Morphological Analysis and Generation for
Machine Translation from and to Arabic”, International Journal of Computer
Applications, Vol. 18, No. 2, 2011.
[39] Anand Balladb and Umesh Chandra Jaiswal, “A Study of Machine Translation Methods
and Their Challenges”, International Journal of Advance Research in Science and
Engineering, Vol. 4, No. 2, 2015.
[41] Zhou Dajun and Wang Yun, “Corpus-based Machine Translation: Its current development
and perspectives”, International forum of teaching and studies, Vol. 11 No. 1-2, 2015.
[42] Thai Phuong Nguyen and Akira Shimazu, “Improving Phrase-Based SMT with Morpho-
Syntactic Analysis and Transformation” in Proceedings of the 7th Conference of the
Association for Machine Translation in the Americas, Cambridge, August 2006.
[43] Ahmed Fasis, Hisham Salam, “Smoothing Techniques evaluation of n-gram language
model for Arabic OCR post-processing”, Journal of Theoretical and Applied Information
Technology, Vol. 82, No. 3, 2015.
[44] Antony P J., “Machine Translation Approaches and Survey for Indian Languages,”
International Journal of Computational Linguistics and Chinese Language Processing,
Vol. 18, No. 1, 2013, pp. 47-78.
[46] Nagao M., “A framework for mechanical translation between English and Japanese by
Analogy principle.” Artificial and Human Intelligence, North Holland, 1984, pp 173-180.
[47] Harold Somers, “Review article: Example based machine translation”, Machine
Translation 14, Vol. 4, pp 113-145, 1999.
[48] I. Sutskever, O. Vinyals, and Q. V. Le, “Sequence to sequence learning with neural
networks,” in Advances in neural information processing systems, pp. 3104-3112, 2014.
79 | P a g e
[49] Dzmitry Bahdanau, KyugnHyun Cho, Yoshua Bengio, Neural Machine Translation By
Jointly Learning to align and translate. Published as a conference paper at ICLR 2015.
[51] Parnia Bahar, Christopher Brix and Hermann Ney, “Towards Two Dimensional Sequence
Model in Neural Machine Translation”, in Proceedings of 2018 Conference on Emprical
Methods in Natural Language Processing, Brussels, Belgium, November 2018, pp 3009-
3015.
[52] Niessen, S., F.J Och, G. Leusch, and H. Ney, “An Evaluation Tool for Machine
Translation: Fast Evaluation for MT Research”, in Proceesing of the 2nd International
Conference on Language Resources and Evaluation, Athens, Greece. 2000.
[53] Tillmann C., S. Vogel, H. Ney, H. Sawaf and A. Zubiaga, “Accelerated DP based Search
for Statistical Translation”, in Proceedings of the 5th European Conference on Speech
Communication and Technology, Rhodes, Greece, 1997.
[54] Maja Popovic, Hermann Ney, “Word Error Rates: Decomposition over POS Classes and
Applications for Error Analysis”, in Proceedings of the 2nd Workshop on Statistical
Machine Translation, Prague, 2007.
[55] Kishore Papineni, Salim Roukos, Todd Ward and Wei-Jing Zhu, “BLEU: a Method for
Automatic Evaluation of Machine Translation”, in Proceedings of the 40th Annual Meeting
of the Association for Computational Linguistics (ACL), Philadelphia, July 2002, pp 311-
318.
[56] Laith S. Hadla, Taghreed M. Hailat and Mohammed N. Ak-Kabi, “Comparative Study
Between METEOR and BLEU Methods of MT: Arabic into English Translation as a Case
Study”, International Journal of Advanced Computer Science and Applications, Vol. 6,
No. 11, 2015.
80 | P a g e
Annex I: Sample Amharic and Afaan Oromo Tagged Sentences for Training
እነዚህ_PRP$ ላሞች_NNS የእሷ_PRP$ ናቸው_AUX ።_PUN Saawwan_NNS kana_PRP$ kan_UNK ishee_PRP$ dha_AUX ._PUN
የእሱ_PRP$ ትልቁ_JJ ቤት_CW አዲስ_JJ ነው_AUX ።_PUN Mani_NN guddaan_JJ isaa_PRP$ haaraa_JJ dha_AUX ._PUN
ፋጡማ_NNP ቆንጆ_JJ መኪና_NN አላት_AUX ።_PUN Faaxumaan_NNP konkoolaataa_NN bareedaa_JJ qabdi_AUX ._PUN
የእኔ_PRP$ ልጅ_NN ሁለት_CD ድመቶች_NNS አሉት_AUX ።
Mucaan_NN koo_PRP$ adurreewwan_NNS lama_CD qaba_AUX ._PUN
_PUN
እኛ_PRP የገብስ_NP ጠላ_NN አንጠጣም_VBG ።_PUN Nu'i_PRP farsoo_NN garbuu_NP hindhugnuu_VBG ._PUN
እነዚህ_PRP$ በጎች_NNS የእሷ_PRP$ ናቸው_AUX ።_PUN Hoolotta_NNS kana_PRP$ kan_UNK ishee_PRP$ dha_AUX ._PUN
81 | P a g e
እሱ_PRP መቀሌ_NNP ላይ_IN ትልቅ_JJ ሆቴል_NN ገዛ_VBD Inni_PRP Maqalee_NNP irraa_IN hooteela_NN guddaa_JJ bitte_VBD
።_PUN ._PUN
በአፍሪካ_NP ዋንጫ_NN ናይጄሪያ_NNP 3ኛ_ON ደረጃ_NN Waancaa_NN Afrikaan_NP Naayjeeriyaan_NNP sadarkaa_NN 3ffaa_ON
በመያዝ_NP አጠናቀቀች_VBD ።_PUN qabachuun_NP xumurteetti_VBD ._PUN
82 | P a g e
Annex II: Sample Parallel Corpus for Testing
83 | P a g e
ቃልኪዳን እና ጫላ ተገናኙ ። Kaalkidaan fi Caalaan walargan.
ዮሃንስ ሻይ እየጠጣ ነው ። Yohaannis shaayee dhugaa jira.
እኔ የገዛውት ዶሮ ሞተ ። Ani kan bitte handaaqqoo du'e.
እነሱ ማራቶን እየሮጡ ነው ። Isaan maaraatoni fiigaa jiran.
እነሱ ትላንት መቀሌ ሄዱ ። Isaan kaleessa Maqalee deeman.
እሱ ትልቅ የድንጋይ ቤት ሰራ ። Inni mana dagaa guddaa ijaare.
እሱ የገብስ ጠላ አይጠጣም ። Inni farsoo garbuu hindhuguu.
እኔ ነገ ወደ መቀሌ እሄዳለው ። Ani bor gara Maqalee nideema.
እሷ ትላንትና ስትሮጥ ነበር ። Isheen kaleessa fiigaa turte.
እሷ አዲስ አበባ ልትሄድ ነው ። Isheen Addis Ababaa deemufi.
የእኔ ትንሹ ቤት አሮጌ ነው ። Mani xiqqaan koo moofaa dha.
እሷ የገብስ ጠላ ጠጣች ። Isheen farsoo garbuu dhugte.
አብዲ ትልቅ ቤት ሰራ ። Abdiin mana guddaa ijaare.
እሱ ትልቅ ቤት ሊሰራ ነው ። Inni mana guddaa ijaarufi.
እኔ ቡና መጠጣት አልወድም ። Ani buna dhugu hinjaaladhu.
እናንተ የገዛቹት በግ ሞተ ። Isin kanbitan hoolaa du'e.
እሷ ሶስት ላሞች አሏት ። Isheen saawwan sadi qabdi.
እሱ ትላንት ሲሮጥ ነበር ። Inni kaleessa fiigaa ture.
እኔ ሶስት ኪሎ ቡና ገዛው ። Ani kiiloo sadi buna bite.
እኛ መኪና ልንገዛ ነው ። Nu'i konkoolaataa bituufi.
እሷ አራት ላሞች ገዛች ። Isheen saawwan afur bitte.
ይህ የጫልቱ ቤት ነው ። Kuni mana Caaltuu dha.
እሷ አልጋ ላይ ተኛች ። Isheen siree irraa rafte.
እነሱ አንድ ከብት አላቸው ። Isaan sangaa tokko qaban.
እሱ መኪና ሊገዛ ነው ። Inni konkoolaataa bitufi.
አስቴር ቡና እየጠጣች ነው ። Asteer buna dhugaa jirti.
ቦንቱ ዶሮ ገዛች ። Boontun handaaqqoo bitte.
እኔ መኪና የለኝም ። Ani konkoolaataa hinqabu.
እነሱ አንድ በግ አላቸው ። Isaan hoolaa tokko qaban.
እሷ ሐሙስ ትመረቃለች ። Isheen kamisa eebbifamti.
እነሱ ቤት ውስጥ ናቸው ። Isaan mana keessa jiran.
እሷ መፅሐፍ አነበበች ። Isheen kitaaba dubbifte.
ሃይማኖት መኪና ትወዳለች ። Haaymaanoot konkoolaataa jaalati.
እኛ ሁለት በጎች ገዛን ። Nu'i hoolotta lama bine.
ስሟ ሜላት ነው ። Maqaan ishee Melaat dha.
ጫልቱ አስተማሪ ነች ። Caaltuun barsiistuu dha.
ሃና ኳስ ትጫወታለች ። Haanaan kubbaa taphatti.
እሱ በፍጥነት እየነዳ ነው ። Inni ariitin fiiga jira.
84 | P a g e
Annex III: Sample language model for Amharic
\data\
ngram 1= 1545
ngram 2= 3511
ngram 3= 650
\1-grams:
-3.42284 ሙያ -0.270106
-2.38144 ላይ -0.258859
-3.54778 አሉ -0.193245
-0.952646 ። -2.69646
85 | P a g e
-3.1798 የውጭ -0.166498
-2.53353 እና -0.136435
-3.72387 የእግዚአብሔርን-0.11749
86 | P a g e
-3.54778 ድረስ -0.11749
-3.72387 ሮም -0.11749
-3.54778 ሊግ -0.11749
-2.74614 ኳስ -0.335255
-2.72387 ጋር -0.213767
87 | P a g e
Annex IV: Sample language model for Afaan Oromo
\data\
ngram 1= 1539
ngram 2= 3569
ngram 3= 801
\1-grams:
-3.75259 <s> -0.693656
-3.20852 Barsiistooni -0.462829
-2.38152 fi -0.166393
-3.35465 hojetootnii -0.156635
-3.57649 koleejjii -0.188457
-3.45156 Teknikaa -0.356403
-3.45156 Ogummaa -0.356403
-3.45156 Allaagee -0.166986
-2.6734 , -0.132
-3.75259 Filannoo -0.126987
-3.45156 Biyyoolessaa -0.126987
-3.75259 Itoophiyaatti -0.126987
-3.75259 gaggeffamaa -0.126987
-2.93967 ture -0.763859
-2.35465 irraa -0.279177
-3.75259 hirmaannee -0.126987
-3.75259 hinbeeknu -0.126987
-3.57649 jedhu -0.188457
-0.981734 . -2.95079
-0.981366 </s> -2.95116
-3.57649 imaammata -0.126987
-3.57649 hariiroo -0.126987
-3.45156 dhimma -0.166986
-3.57649 alaa -0.126987
-2.97444 Ityoophiyaa -0.148347
88 | P a g e
-3.75259 haala -0.126987
-3.57649 addunyaa -0.126987
-3.75259 hubachuudhan -0.126987
-3.75259 yaadaa -0.126987
-3.75259 hojimaata -0.126987
-3.75259 qabatamaa -0.126987
-3.45156 irratti -0.126987
-3.75259 hundaa -0.126987
-1.65222 ' -0.775036
-2.62225 e -0.965745
-3.75259 uumuun -0.126987
-3.75259 barbaachisaa -0.126987
-2.60646 ta -1.31064
-3.57649 uun -0.126987
-3.27547 ka -0.578251
-3.75259 eera -0.126987
-3.75259 Dhuguma -0.126987
-3.75259 dhuguman -0.126987
-3.75259 isinitti -0.126987
-3.75259 hima -0.126987
-3.20852 namoota -0.146526
-3.57649 as -0.126987
-3.75259 dhaabatanii -0.126987
-2.87753 jiran -0.830806
-2.77486 keessa -0.30272
-3.75259 kaan -0.126987
-3.57649 utuu -0.126987
-2.22754 hin -0.325472
-2.79834 du -1.10973
-3.20852 in -0.126987
-3.27547 mootummaan -0.150541
89 | P a g e
-3.57649 Waaqayyoo -0.126987
-3.75259 humnaan -0.126987
-3.75259 dhufee -0.126987
-3.75259 arguug -0.126987
-3.05362 jiru -0.638921
-3.35465 dorgommii -0.156635
-3.57649 Diyaamand -0.188457
-3.57649 Liigii -0.126987
-2.26122 kaleessa -0.428132
-3.27547 galgala -0.249741
-3.45156 Xaaliyaan -0.126987
-2.90749 magaalaa -0.144991
-3.75259 Roomitti -0.126987
-3.75259 gaggeeffameen -0.126987
-3.45156 atleetonni -0.356403
-3.45156 olaantummaadhan -0.356403
-3.45156 xumuraniiru -0.356403
-3.57649 Miishal -0.188457
-3.57649 Plaatiiniin -0.126987
-3.35465 waancaa -0.126987
-2.97444 kubbaa -0.293032
-3.27547 miillaa -0.126987
-3.75259 2022n -0.126987
-3.75259 walqabatee -0.126987
-3.75259 malaammaltummaadhan -0.126987
-3.75259 shakkamaniiti -0.126987
-3.27547 to -0.578251
-3.75259 annaa -0.126987
-3.27547 jala -0.383648
-2.62225 kan -0.252327
-3.75259 oolfaman -0.126987
90 | P a g e
Annex V: Transliteration from Amharic alphabets to Latin characters
First Second Third Fourth Fifth Sixth Seventh
Order Order Order Order Order Order Order
ሀ hä ሁ hu ሂ hi ሃ ha ሄ he ህ h ሆ ho
ለ lä ሉ lu ሊ li ላ la ሌ le ል l ሎ lo ሏ lWa
ሐ Hä ሑ Hu ሒ Hi ሓ Ha ሔ He ሕ H ሖ Ho ሗ HWa
መ mä ሙ mu ሚ mi ማ ma ሜ me ም m ሞ mo ሟ mWa
ሠ Sä ሡ Su ሢ Si ሣ Sa ሤ Se ሥ S ሦ So ሧ SWa
ረ rä ሩ ru ሪ ri ራ ra ሬ re ር r ሮ ro ሯ rWa
ሰ sä ሱ su ሲ si ሳ sa ሴ se ስ s ሶ so ሷ sWa
ሸ śä ሹ śu ሺ śi ሻ śa ሼ śe ሽ ś ሾ śo ሿ śWa
ቀ qä ቁ qu ቂ qi ቃ qa ቄ qe ቅ q ቆ qo ቋ qWa
በ bä ቡ bu ቢ bi ባ ba ቤ be ብ b ቦ bo ቧ bWa
ቨ vä ቩ vu ቪ vi ቫ va ቬ ve ቭ v ቮ vo ቯ vWa
ተ tä ቱ tu ቲ ti ታ ta ቴ te ት t ቶ to ቷ tWa
ቸ cä ቹ cu ቺ ci ቻ ca ቼ ce ች c ቾ co ቿ cWa
ኀ Ĥä ኁ Ĥu ኂ Ĥi ኃ Ĥa ኄ Ĥe ኅ Ĥ ኆ Ĥo ኇ ĤWa
ነ nä ኑ nu ኒ ni ና na ኔ ne ን n ኖ no ኗ nWa
ኘ Nä ኙ Nu ኘ Ni ኛ Na ኜ Ne ኝ N ኞ No ኟ NWa
አ xä ኡ xu ኢ xi ኣ xa ኤ xe እ x ኦ xo ኧ xWa
ከ kä ኩ ku ኪ ki ካ ka ኬ ke ክ k ኮ ko ኳ kWa
ኸ Kä ኹ Ku ኺ Ki ኻ Ka ኼ Ke ኽ K ኾ Ko ዃ KWa
ወ wä ዉ wu ዊ wi ዋ wa ዌ we ው w ዎ wo
ዐ Xä ዑ Xu ዒ Xi ዓ Xa ዔ Xe ዕ X ዖ Xo
ዘ zä ዙ zu ዚ zi ዛ za ዜ ze ዝ z ዞ zo ዟ zWa
ዠ Zä ዡ Zu ዢ Zi ዣ Za ዤ Ze ዥ Z ዦ Zo ዧ ZWa
የ yä ዩ yu ዪ yi ያ ya ዬ ye ይ y ዮ yo
ደ dä ዱ du ዲ di ዳ da ዴ de ድ d ዶ do ዷ dWa
ጀ jä ጁ ju ጂ ji ጃ ja ጄ je ጅ j ጆ jo ጇ jWa
ገ gä ጉ gu ጊ gi ጋ ga ጌ ge ግ g ጎ go ጓ gWa
ጠ Tä ጡ Tu ጢ Ti ጣ Ta ጤ Te ጥ T ጦ To ጧ TWa
ጨ Cä ጩ Cu ጪ Ci ጫ Ca ጬ Ce ጭ C ጮ Co ጯ CWa
ጰ Pä ጱ Pu ጲ Pi ጳ Pa ጴ Pe ጵ P ጶ Po ጷ PWa
ጸ ťä ጹ ťu ጺ ťi ጻ ťa ጼ ťe ጽ ť ጾ ťo ጿ ťWa
ፀ Ťä ፁ Ťu ፂ Ťi ፃ Ťa ፄ Ťe ፅ Ť ፆ Ťo
ፈ fä ፉ fu ፊ fi ፋ fa ፌ fe ፍ f ፎ fo ፏ fWa
ፐ pä ፑ pu ፒ pi ፓ pa ፔ pe ፕ p ፖ po ፗ pWa
91 | P a g e
Declaration
I, the undersigned, declare that this thesis is my original work and has not been presented for a
degree in any other university, and that all source of materials used for the thesis have been duly
acknowledged.
Declared by:
Name: _______________________________.
Signature: ____________________________.
Date: ________________________________.
Confirmed by advisor:
Name: _______________________________.
Signature: ____________________________.
Date: ________________________________.
92 | P a g e