IPA Transcription of Bengali Texts
Kanij Fatema1 , Fazle Dawood Haider2 , Nirzona Ferdousi Turpa2 , Tanveer Azmal2,1 , Sourav Ahmed1,3
Navid Hasan1,3 , Mohammad Akhlaqur Rahman2,3 , Biplab Kumar Sarkar3,5 , Afrar Jahin3,5
Md. Rezuwan Hassan,3,4 , Md Foriduzzaman Zihad,4,1 , Rubayet Sabbir Faruque,4,1 , Asif Sushmit1,∗
Mashrur Imtiaz2 , Farig Sadeque4 , Syed Shahrier Rahman2
1
Bengali.AI 2 University of Dhaka
3
Shahjalal University of Science and Technology 4 BRAC University 5 Sylhet Engineering College
Abstract
The International Phonetic Alphabet (IPA) serves to systematize phonemes in language, enabling precise textual
representation of pronunciation. In Bengali phonology and phonetics, ongoing scholarly deliberations persist
concerning the IPA standard and core Bengali phonemes. This work examines prior research, identifies current
and potential issues, and suggests a framework for a Bengali IPA standard, facilitating linguistic analysis and NLP
resource creation and downstream technology development. In this work, we present a comprehensive study of
Bengali IPA transcription and introduce a novel IPA transcription framework incorporating a novel dataset with
DL-based benchmarks.
Keywords: IPA, Bengali, Linguistics
1. Introduction languages all over the world (Association, 1999).
Bangla, popularly known as Bengali, is the official The main purpose of IPA is to represent specific
language of Bangladesh and is spoken by a vast speech sounds rather than the abstract linguistic
population of 272.7 million people in Bangladesh units known as phonemes, although it is also used
and some regions of India, along with a massive for phonemic transcription (Association, 1999). IPA
Bengali diaspora all across the globe. In various follows a common policy of using one letter for each
communities, Bangla-speaking people use a wide segment. As a result, two letters are not put to-
variety of dialects of the language. The morpholog- gether to represent one single sound. For example,
ical variations among these dialects are relatively In the word ‘shine’, ‘sh’ is used to convey one single
subtle, but distinctions are found in sounds and sound. IPA doesn’t usually provide separate char-
phonology. This calls for a consistent IPA transcrip- acters for sounds that aren’t differentiated in known
tion protocol for canonical and dialectal variations languages. Both broad and narrow transcriptions
of Bangla, to document the language well. can be used in IPA. Details on IPA representation
of Vowels, Consonants, Suprasegmentals and Dia-
1.1. International Phonetic Alphabet (IPA) critics are shown in Appendix section.
A Bangla-to-IPA transcription model requires a pho-
netic transcription scheme to represent the tran- 1.2. Literature Review: Bangla IPA
scription and pronunciation patterns for the lan-
guage. The International Phonetic Alphabet (IPA) Bangla (internationally popular term Bengali is used
stands as the sole standard for phonetic writing sys- interchangeably in this paper as well) possesses a
tems.Regardless of the language in question, the distinctive phonetic inventory, which can be rep-
International Phonetic script relies on Roman char- resented using the IPA. The requirement of an
acters as well as incorporates modified elements IPA transcription model is a phonetic transcription
from diverse scripts like Greek to convey phonetic scheme to represent the pronunciation patterns
notation. The IPA-provided symbols such as (t, ɛ, for the language. Numerous studies have delved
ʃ, k, ̪ ) are to be used for even those language into the standard IPA representation for Bangla, de-
that does not employ the Roman alphabet, such as veloping a range of perspectives and viewpoints.
Bangla, Hindi, Japanese, or Korean. Recently, a government-sanctioned IPA website
Since its establishment in 1886, the International has been introduced in Bangladesh. This platform
Phonetic Association has concerned itself with de- adheres to the standard International Phonetic Al-
veloping a system of symbols that maintains a bal- phabet (IPA) corresponding to the Bangla language,
ance between usability and inclusivity, which en- as outlined in the revised 2015 version. It encom-
compasses the wide variety of sounds present in passes seven vowels ই/i/, এ/e/, অ্যা/æ/, আ/a/, অ/ɔ/,
ও/o/, and উ/u/ and provides suggestions for semi-
*
Corresponding author:
[email protected], farig.sad- vowels, specifying their corresponding IPA symbols
[email protected],
[email protected], ss.rah- as following: ই শ্রুিত/j/, অ/য় শ্রুিত/y/ and ব শ্রুিত/w/. Al-
[email protected] though, alternative representations are proposed
for specific cases, such as ও/o̯/, ই/i ̯/ এ/e̯/ and উ/u̯/. 1.2.2. Bangla Semi-Vowels
In terms of consonants, the website employs a set According to Chatterji (1921) and Sen (1993),
of 31 phonemes. For voiced consonants, they have there are two Bangla semivowels, namely অন্তস্থ ব/w/
provided both voiced /ʰ/ and voiceless/ʱ/ aspiration and অন্তস্থ য় /y/. Hai (1964) contends that there are
such as for ঢ /dʰ/ and /dʱ/, for ধ /d̪ʰ/ and /d̪ʱ/, for ভ three semivowels: অন্তস্থ ব/w/, অন্তস্থ য় /y/, and অন্তস্থ
/bʰ/ and /bʱ/, for ঢ /ɽh/ and /r̪ɦ/. For য়, they provided ই/i/. Morshed (1997) argues that while অন্তস্থ ব/w/
the /y/, even though this sound does not exist in and অন্তস্থ য় /y/ are considered semivowels in En-
the Bangla language. glish, they do not possess similar status in Bangla.
This section further delves into these discussions A different perspective was presented by Fergu-
before exploring the suggested IPA protocol for this son and Chowdhury (1960), who claim that there
dataset and outlining the validation challenges. are four semivowels: /i e o u/. It is noted in Ali
(2001) that this assertion was supported by Pobitro
1.2.1. Bangla Vowels Sharker and Ghonesh Boshu (1998). Along with
Chatterji (1921) used Jones (1922)’s cardinal the ই/i ̯/, উ/u̯/, and ও/o̯/, there is a fourth semi-vowel
vowel system to explain the Bangla vowel system. which is এ/e̯/ that is found at the end of the word in
He claimed that the Bangla language has seven the form of ‘য়’ such as হয়/hɔe͡ /̯ , যায়/ja͡e/̯ (Ali, 2001).
primary vowels ই/i/, এ/e/, অ্যা/æ/, আ/a/, ও/o/, অ/ɔ/, and
উ/u along with their corresponding nasal counter- 1.2.3. Bangla Diphthongs
parts /� ̃ ẽ æ̃ ã õ ɔ̃ ũ/. Chatterji also noted that Bangla Sen (1993) noted that the Bangla has two diph-
vowels are generally articulated in a lax manner, thongs: ঐ(oɪ) and ঔ(ou). These combinations of
imparting the characteristic ’timbre’ to the vowel sys- two sounds do not fit the conventional definition of
tem. Morshed (1997) categorized the vowels as diphthongs but are represented in written form. In
/i, u, e, o, ae, ɔ, and a/, including two high, two high- linguistic terms, they are referred to as digraphs (Ali,
mid, two low-mid, and one low vowel. Ali (2001) 2001). On the contrary, Chatterji (1921) claimed
investigated vowel contrasts, defining phonologi- that there are 25 diphthongs in standard Bangla.
cal properties, and reported the same number of Hai (1964) asserted that there are a total of 31
vowels, with a subtle distinction. He employed the diphthongs, categorizing them into 19 regular and
symbol /ɛ/ to represent the vowel /æ/ as described 12 irregular ones. However, he also once argued
by Morshed (1997). that there are only 18 diphthongs, as noted by Ali
In a separate study, Hai (1964) analyzed the vow- (2001), who in turn asserts that there are 17 diph-
els of Standard Bangla using the concept of cardi- thongs in Bangla. The government-approved IPA
nal vowels. He claimed that there are eight vowels website acknowledges the regular 19 diphthongs,
ই/i/, এ/e/, অ্যা/æ/, আ/a/a, ও/o/ ও'/o’/, অ/ɔ/, and উ/u/ in but they have used the diphthong /ui ̯/ two times and
the Bangla language. He categorizes ই/i/, এ/e/, did not consider the /eo̯/ diphthong.
অ্যা/æ/ as front vowel and ও/o/, ও'/o’/, অ/ɔ/, and উ/u/
as back vowel. In contrast to Morshed (1997), 1.2.4. Bangla Consonants
Hai did not classify the Bangla vowel আ/a/a as oc- There have been numerous past studies, primarily
cupying a central position. He explained that the rooted in articulatory phonetics, that have exam-
Bangla আ/a/ sound differs from the neutral qual- ined the articulatory and acoustic characteristics of
ity of the English /a/ and is distinct from the Urdu Bangla consonants. It is described in (Hai, 1964)
close /ə/ sound. Instead, he characterized it as an that Bangla consonant has 20 stops, 7 fricatives, 4
open vowel. Hai also pointed out the presence of nasals, 1 lateral, 1 trill, 2 flaps, and 1 glide; totaling
an additional vowel in the Bangla vowel system, 36 consonants. Hai (1964) claims that there’s only
denoted as /o'/. He explained that when produc- on phone close to /ʃ/ in Bangla. Huq (2002) pre-
ing the /o'/ sound, the lips are slightly less rounded sented a slightly different categorization of a total
compared to the /o/ sound. However, there isn’t a of 35 consonants, presenting 21 stops, 5 fricatives,
significant difference in the gap between the jaws, 3 nasals, 1 lateral, 1 trill, 2 flaps, and 2 glides. Mor-
and the back of the tongue is not raised as much shed (1997) stated that Bangla includes 20 stops, 4
as it is when articulating the /o/ sound. This led nasals, 4 fricatives, 1 lateral, and 2 flaps, totaling 31
him to term it as yotized o (oʸ), known in Bangla as consonants. On the other hand, Ali (2001) argued
অিভশ্রুত /obʱɪsɾut ̪o/ ও /o/ or ও' /o'/. This observation that Bangla has 20 stops, 3 nasals, 3 fricatives, 1
was supported by Huq (2002). An example pro- lateral, 2 flaps, 1 trill, and 2 glides, resulting in a
vided for this distinction is between িবেয়র ক'�ন/bɪʲeɾ total of 32 consonants.
ko’ne/ and ঘেরর �কােণ /gʱɔɾeɾ kone/. Nevertheless,
it’s worth noting that there is limited empirical ev- 1.3. Our Contribution
idence to support this concept. On the contrary, In this work, we present A comprehensive study
the claim that the number of vowels is seven is of IPA transcription issues and challenges for
backed by Pobitro Sorkar (1992) and Puny Sloka Bangla, a novel IPA transcription framework, a
Ray (1997) as noted in Ali (2001). DUAL-IPA, a sentence level ipa transcripted paral-
lel corpus of 150k samples and DL-based bench- /ɪ/sound, the position of the tongue remains slightly
marking results. We open-source the dataset with lower and back in the mouth in comparison to the
the CC BY-SA 4.0 license. /i/. The reason we propose /ɪ/ for the Bangla letter
'ই' is that the /ɪ/is a lax vowel and when we produce
2. Bangla IPA Transcription the 'ই' sound, there is less muscular tension in the
tongue. This adjustment better aligns with the ar-
Despite the global use of the Bangla language,
ticulation of native Bangla speakers, where the /ɐ/
there’s a notable absence of a comprehensive IPA
and /ɪ/ sounds are more appropriate. Regarding
transcription framework and modeling. While the
the অ্যা sound, both /æ/ and /ɛ/ are true equivalents.
government-endorsed IPA system exists, it doesn’t
However, for consistency in our dataset, we have
always offer clear explanations for specific diacritic
chosen to use /ɛ/ exclusively.
usage, nor does it provide consistent reasoning for
transcribing loaned words, accounting for morpho-
logical variations, or giving accurate IPA transcrip- Front Central Back
tions. Besides, there remain unresolved debates High ɪ u
among linguists regarding the inventory of vow-
els, semi-vowels, diphthongs, and consonants in High-mid e o
Bangla. Scholars like Hai (1964) have observed Low-mid æ/ɛ ɔ
that the existence of long vowels in the language
does not make a difference in the meaning and Low ɐ
specific tongue positions for vowel/a/, which leads
us to questions about the articulation manner of Table 1: Bangla Proposed Vowel Chart
morphological suffixes and accurate numbers of
pure vowels in the language.
Regional variations of the Bangla language further 2.2. Semi-vowel
complicate matters, impacting not only the pronun-
Semi-vowels, often referred to as glides or semi-
ciation variation among individual speakers but also
consonants, are phonetically identical to vowels
how sounds are produced based on different re-
but function as the syllable’s boundary rather than
gions and dialects. Noting all these drawbacks of
as the nucleus, which is the central component of
the Bangla language, we propose an IPA frame-
the syllable. In the International Phonetic Alphabet
work that we’ve employed to create a dataset of
(IPA), the arch diacritic ( ̯ ) which is an inverted
70,000 words, alongside a modeling approach for
breve is used beneath semi-vowels to denote their
accurate Bangla-to-IPA transcription. It’s worth
dual nature, exhibiting features of both vowels and
mentioning that our suggested phonetic represen-
consonants. We have proposed four semi-vowels
tations may not be universally accepted, and users
that have been incorporated into the dataset.
are encouraged to substitute specific phonemes
Those are given below in ( Bangla, /IPA/) template,
with alternatives that better align with their linguistic
(ই, /ɪ ̯/), (উ, /u̯/), (ও, /o̯/) and (এ, /e̯/)
preferences. With the readily available IPA chart,
individuals can easily determine which sounds best
match the intended IPA representation. 2.3. Diphthongs
Hai (1964) provided a list of 31 Bangla diphthongs
2.1. Vowels among which 19 diphthong (ɐɪ ̯, ɐe̯, ɐu̯, ɐo̯, ɛe̯, ɛo̯, ɔe̯,
In our proposed IPA, we conducted a thorough re- ɔo̯, eɪ ̯, eu̯, oɪ ̯, oe̯, ou̯, oo̯, ɪɪ ̯, ɪu̯, uɪ ̯, uu̯, eo̯) are com-
view and made some revisions that were then incor- monly found in the Bangla language. He further
porated into our dataset. It’s important to note that explores the Bangla diphthongs and claims that
the vowel sounds in Bangla are articulated in a lax there are extra 12 diphthongs (ɪe̯, ɪa, ɪo̯, ea, eo̯, æa,
manner. After carefully listening to the IPA sounds oa, oe, ue, ua, uo) occurs irregularly.
provided by Ladefoged and Johnson (2014), we To maintain clarity, it’s wise to include all 31 diph-
devised a chart where we recommend substituting thongs, especially considering the presence of re-
/ɐ/ for /a/ when representing the Bangla letter 'আ'. gional dialects that might feature words absent in
The /a/ is an open vowel and it’s produced towards standard Bangla. Moreover, accurately discerning
the front of the mouth. On the other hand, /ɐ/ is diphthongs requires audio reference rather than
produced at the center of the mouth and the mouth relying solely on written text. It’s essential to ac-
is slightly less open while articulating this which is knowledge irregular diphthongs, particularly those
more suitable for the Bangla letter 'আ' rather than involving the /a/ sound, which lacks a semi-vowel
the /a/ sound. Similarly, for the Bangla letter 'ই', counterpart in Bangla. Therefore, the determina-
we propose representing it as /ɪ/. The position of tion of whether a diphthong is rising or falling as
/ɪ/ is a near-high, front vowel in comparison to /i/ well as whether is a vowel cluster or actually a diph-
which is a high, front vowel. While producing the thong hinges on careful consideration.
Place
Bilabial Dental Alveolar Post-Alveolar Palatal Velar Glottal
Manner
Unasp Asp Unasp Asp Unasp Asp Unasp Asp Unasp Asp
Voiceless প/p/ ফ/pʰ/ ত/t ̪/ থ/t ̪ʰ/ ট/t/ ঠ/tʰ/ চ/c/ ছ/cʰ/ ক/k/ খ/kʰ/
Stop
Voiced ব/b/ ভ/bʱ/ দ/d̪/ ধ/d̪ʱ/ ড/d/ ঢ/dʱ/ জ, য /ɟ/ ঝ/ɟʱ/ গ/g/ ঘ/gʱ/
Nasal ম/m/ ন, ণ/n/ ঙ, ◌ং/ŋ/
Tap র /ɾ/
Flap ড়/ɽ/, ঢ়/ɽʰ/
Fricatives শ, স/s/ শ, ষ, স/ʃ/ *হ/h/
Lateral ল/l/
Approximant *য়/j/
Table 2: Proposed Consonant Chart. Here, Unasp. is used for unaspirated, and Asp. is used for aspirated
2.4. Consonants more closely with plosives rather than affricates.
In certain contexts, the 'হ' /h/ have extra careful
2.4.2. ট - Alveolar or Retroflex
articulation. For example, the word ‘হ্রাস’ in normal
conversation would be pronounced as /ɾɐʃ/ but a
news presenter or a person reciting a poem would ট ঠ
articulate with an aspiration sound in the initial po-
sition of the word such as /ʰɾɐʃ/, following a more Alveolar t tʰ
accepted canonical standard. Retroflex ʈ ʈʰ
In the Bangla language, the য় /j/ is not articulated
as a phoneme but is commonly used in the co- Table 4: Alveolar or Retroflex in Bengali
articulation. For example, �দউিলয়া /d̪eulɪʲɐ/, িনয়িত
/nɪʲɔt ̪ɪ/, িনয়ম /nom/- in these three words the Bangla The ট sound in Bangla is produced with the alve-
letter ‘য়’ is pronounced as palatalized /ʲ/. দাবায় olar ridge acting as the fixed point in the mouth
/d̪ɐbɐ͡e/̯ , জয় /ɟɔe͡ /̯ - ‘য়’ is pronounced as diphthong. (table 4). The active part, which usually includes
There are a few disputes among linguists regarding the tip of the tongue, interacts with this ridge dur-
Bangla consonants. We have discussed the issues ing articulation (Hai, 1964). Abdul Hai (Hai, 1964)
and provided a solution which we have followed in acknowledges that while articulating words, the
this consonant chart and in the curated dataset. tip of the tongue curls up and back. This is why
he categorizes it as an alveolar-retroflex-plosive
2.4.1. Plosive vs. Affricate Argument sound (Hai, 1964).
চ ছ জ ঝ 2.4.3. ফ - /pʰ/ and /f/
The pronunciation of the sound represented by ফ
Plosive c cʰ ɟ ɟʰ in Bangla can vary regionally. While it is generally
Affricate tʃ tʃʰ dʒ dʒʱ considered a plosive sound, in some regions, it
may be perceived as a labio-dental fricative /f/ (Hai,
Table 3: Plosive vs Fricative in Bangla 1964).
Sometimes native speaker articulates words such
There has been a longstanding dispute among lin- as ফির/foɾɪ/, ফাইজলািম/fɐɪɟlɐmɪ/, ফরােলহা/fɔɾɐlehɐ/ with
guists about whether certain Bangla sounds, par- a dialectal accent of a certain region. While produc-
ticularly those represented by চ c, ছ cʰ, জ ɟ, and ing the /pʰ/ sound, they tend to bring the bottom lip
ঝ ɟʰ, should be classified as affricates or plosives close to the upper teeth, creating a narrow passage
(table 3). Hai (1964) agreed with this discussion for the air to flow through. This suggests that ফ can
and sided with the view that these sounds are best indeed resemble a labio-dental fricative sound /f/.
described as palatal plosives. In this proposal, we However, it’s important to note that this can still be
agree with this perspective, as when we consider a subject of debate, with variations observed from
how we articulate these words, they seem to align region to region and from person to person. As for
written transcription, without the aid of audio from a both /bʰ/ or /bʱ/ for the transcription of the letter ‘ভ’
regional speaker, accurately determining whether ফ despite that the /ɦ/ should be voiced after voiced
/f/ is pronounced as a plosive or a labio-dental frica- consonants.
tive can be challenging. But if we have audio data
from regional speakers, we can transcribe words 2.5. Diacritics
that are pronounced with dialectal accents with /f/ Our proposed diacritics for standard Bangla are /ʷ/
sound (such as fɔralæha) and other words that are (Labialized), /ʲ/ (Palatalized) and /◌̃/ (Nasalized )
also found in the standard Bangla with /pʰ/ (such
as pʰul, pʰɔʃol). Labialized: The use of labialized diacritics is found
Another concern with the /pʰ/ sound is when dealing in Bangla words such as উপরওয়ালা /upoɾoʷɐlɐ/, �দ-
with borrowed foreign words, there can be further ওয়া /d̪eoʷɐ/, �নওয়া /neoʷɐ/, etc where the consonant
variations in pronunciation. A native speaker of sounds indicate that they are pronounced with
standard Bangla uses the loaned word with a re- rounded lips. In certain cases, diphthongs are pro-
ceived pronunciation. Hence for the loaned words, nounced with simultaneous lip rounding, such as
the labio-dental f sound has been used for the tran- রওশন /rɔo͡ ʷ
̯ .ʃon/.
scription of the Bangla letter ফ. Palatalized: To determine the use of palatalized ʲ,
we have followed two phonological rules. The rule
2.4.4. Trill r vs. Tap ɾ for determining whether the Bangla consonant য়
The government website employs the trill ’r’ sound, (j) is palatalized or functions as a diphthong is as
but in the Bangla language, for words such as রাজা, follows:
রাজ্য, and রাগ we don’t naturally produce the trill Case of coda য়: When the position of the য় is in the
sound. To ensure better pronunciation, the tap syllable-final, without a following vowel, it remains
sound (ɾ) would be more suitable for Bangla. unpalatalized. For example, in compound words
like মামলায় /mɐmlɐ͡e/̯ , িনরাপত্তায় /nɪɾɐpot ̪t ̪ɐ͡e/̯ , etc.
2.4.5. Contextual Substitution of phoneme
Case of middle য় Conversely, if a word with য় con-
The Bangla /ɟ/ is a voiced palatal stop and in stan-
cludes with a vowel in the syllable’s final position
dard Bangla, there is no voiced alveolar fricative
and does not have য় in the word’s final position, it
/z/. Furthermore, in the Bangla language, the clos-
will be pronounced as a palatalized ʲ. For instance,
est phoneme with the labio-dental fricatives such
this can be observed in words like �ছেলেমেয় /cʰele-
as /f/ and /v/ are aspirated labial stops /pʰ/ and
meʲe/, খায়রুল /kʰɐʲeɾul/, and িনয়ক /niʲom/.
/bʱ/. However, many words in standard Bangla are
Nasalized: It was mentioned earlier that in Bangla,
adapted from foreign languages such as English,
all seven oral vowels have their seven nasal coun-
Arabic, Farsi, and so on. When native speakers ar-
terparts which is described using the nasalized
ticulate these loaned words they do not pronounce
diacritics /ɪ ̃ ẽ õ ã ɔ̃ ũ/. This nasalization of vowels in
them in the same way a native English or native
Bangla text is consistently indicated by a diacritic
speaker Arabic does, but pronounce these with a
known as ’chandarabindu’ (◌ঁ) placed above the rel-
native influence. Hence, for loaned words where
evant segment, and this occurrence is a common
the speaker articulates these foreign phonemes in a
feature in Standard Bangla text.
certain word context, we consider these phonemes
(/ɟ/, /f/, /v/) in the IPA transcription.
2.6. Loan Words Consideration: Vowel
2.4.6. Voiced Aspiration and Consonant
Aspiration is a distinctive feature in the Bangla In the Bangla language, using loaned words from
phoneme. It can be noted from the chart above that foreign languages and using them with a different
ভ /bʱ/, ধ /d̪ʱ/, ঢ /dʱ/, ঝ /ɟʱ/, and ঘ /gʱ/ are voiced pronunciation in comparison to their native pronun-
aspirated stops. Aspiration is about how much air ciation is quite common. In the case of vowels, no
leaves your mouth while articulating the phoneme. foreign phonemes are produced by native speak-
If an unvoiced consonant is aspirated, then an ex- ers. For example, the English word ‘foam’, ‘cloud’,
tra puff of air leaves the mouth after the primary and ‘flower’ is pronounced as /foʊm/, /klaʊd/, and
articulation is complete. For example in /pʰ/, /t ̪ʰ/, /flaʊə/ by native English speakers. However, /ʊ/ and
/cʰ/, /tʰ/, and /kʰ/ voiceless aspiration occurs, hence /ə/ are not articulated by the Bengali native speak-
for the secondary articulation of the aspiration, we ers. Instead, they pronounce these words using the
use /ʰ/ which is voiceless. On the other hand, /bʱ/, existing vowel phonemes of the Bangla language.
/d̪ʱ/, /dʱ/, /ɟʱ/ and /gʱ/ are voiced stops and for that On the contrary, there are a few cases where
reason, it is suitable to use a voiced aspiration /ʱ/ foreign words are pronounced using consonant
for the secondary articulation. In the govt-IPA, the phonemes which does not exist in Bangla.
aspiration suggestions for voiced stops have both Labio-dental fricative sounds such as /f/, and /v/ do
voiced /ʰ/ aspiration and voiceless /ʱ/ aspiration as not exist in the Bangla language but they are artic-
their secondary articulation. For instance, they kept ulated by the native speakers when they produce
loaned words with these phonemes. In Bengali ate confusion to distinguish them from diphthongs
Some examples are: Plosive ফ (/pʰ/): ফিড়ং (/pʰoɾɪŋ/); such as the above word গরুগুেলাও /goɾugulooː/, some
Plosive ভ (/bʰ/): ভয় (/bʱoe̯/); Fricative (/f/): (�ফইল might transcribe it as গরুগুেলাও /goɾugulo͡o̯/ because
(/fe͡ɪ ̯l/)); Fricative (/v/): (িভউ (/vɪu/)) there are two vowels together in the word. But if we
Same case for the alveolar fricative phoneme /z/. notice carefully and break into the syllable of the
Loaned words from Arabic and English languages /go.ɾu.gu.lo.oː/, both of the vowels belong to different
such as �মরাজ /meɾɐz/, ম্যাগািজন /mɛgɐzɪn/, �মানাজাত syllables, even if both of the vowels are beside each
/monɐzɐt ̪/ are continuously used in the Standard other the last vowel o is pronounced with a long
Bangla. For example, the plosive sound /ɟ/ for জ, sound. This is the reason we have annotated mor-
য is present in Bengali whereas the Fricative /z/ is phological variation in such cases with long vowel
found in loan words such as ম্যাগািজন (/mɛgɐzɪn/) marks. Some sample cases are শুিটংেয় (ʃu.tɪŋʲ.eː),
English words such as judge /ʤʌʤ/, and jus- শুিটংও (ʃu.tɪŋ.oː) and গরুগুেলাও (goɾugulooː).
tice /dʒʌstɪs/ have voiced postalveolar affricate /dʒ/
which is not used by native Bangla speakers. They 2.7.2. Diphthongs
turn this affricate sound into the plosive sound /ɟ/ Our dataset contains cases of Bangla diphthongs.
and articulate it as /ɟudɟ/ and /ɟustɪs/. To accurately transcribe them, it’s crucial to first
The English language has a voiceless dental frica- identify whether they are indeed diphthongs. Syl-
tive sound /θ/ which is not found in the Bangla lan- labification serves as a method to recognize diph-
guage. They turn this phoneme into a voiceless thongs which makes the process easier. However,
aspirated dental plosive sound /t ̪ʰ/. So ‘think’ is due to the shortness of time, we decided to avoid
pronounced as /t ̪ʰiŋk/ in its Bangla adaptive form. the process of syllabication of each word just to
The /s/ is a voiceless fricative alveolar sound that is identify diphthongs. Another significant aspect in
found in both Bangla and other foreign languages distinguishing diphthongs is the use of the glide.
such as English. The upper diphthong glide ( ) describes the move-
͡
ment of the articulatory vocal organs, particularly
2.7. Validation and Linguistic Challenges the tongue, from a higher position to a lower one
of Standard Bengali IPA during diphthong production. This downward move-
ment contributes to the distinct sound of the diph-
2.7.1. Morphological Variations in Words thong. Each language possesses its own set of
The Bangla language exhibits an extensive array unique diphthongs. We’ve provided a diphthong
of morphological variations, presenting a challenge chart, from which standard Bangla focuses primar-
in accurately contextualizing the meaning of words ily on the regular diphthongs. Understanding the
in light of their morphological alterations. It poses role of the glide and accurately using it ensures the
a challenge to accurately represent these subtle correct pronunciation of words in a given language.
morphological variations within the framework of Some examples are
the International Phonetic Alphabet (IPA). পিরচয�ায় (poɾɪcɔɾɟɐ͡e)̯ , ভাই (bʱɐ͡ɪ ̯), যাচাই (ɟɐ.cɐ͡ɪ ̯), চাই (cɐ͡ɪ ̯),
Consider the Bangla word আজেকই, transcribed as দু ই (du͡ɪ ̯), �বাঝাই (bo.ɟʱa͡ɪ ̯)
/ɐɟkeɪː/, or loaned words with Bangla morphologi- Sometimes, a few cases of standard Bangla are
cal extensions like �মি�েকােতও /meksɪkoto:/ and �ম- found which may cause confusion to the reader, if a
ি�েকাও /meksɪkoo:/. While these all end with a vowel, certain word has a diphthong or vowel cluster. For
without a syllabic marker, it may not be immediately example, িশেরাইেল is transcribed as /ʃɪɾoɪle/, here
clear that these suffixes are part of the base word. the ɾoɪ constitutes one single syllable, but the ques-
However, by incorporating the lengthening diacritic tion remains if it is a vowel cluster or diphthong.
after the word (the long vowel diacritic /ː/), this dis- Bangla native speakers articulate this word in this
tinction becomes more apparent to the reader. way where a downward movement of tongue posi-
The reason for utilizing this diacritic is rooted in tion from o to occurs. As a result, the o stays as a
certain linguistic contexts. In some cases, when pure vowel and glides toward ̯ which creates a diph-
producing specific vowels, some individuals per- thong. Hence, the final transcribed text is /ʃɪɾo͡ɪ ̯le/. If
ceive a long i: as merely an extended version of the pronunciation of the word were something such
the short vowel, without any discernible difference as /ʃɪ.ɾo.ɪ.le/ where the letters are pronounced as
in quality, i.e., without raising the tongue for the long a pure vowel and separately from the syllable then
sound. For instance, Bangla e: is slightly higher the final result might have been something different.
than Bangla e, and Bangla e̯ (short) falls midway be-
tween cardinal e and ɛ. This concept is supported 2.7.3. Loan words
in the work of Suniti Kumar Chatterji as well. Fur- Native speakers of the Bangla language commonly
thermore, this long vowel diacritic also clears out integrate vocabulary from English, Arabic, Farsi,
the confusion that no case of diphthongs is present and Portuguese into their speech. As a result, dis-
here (�মি�েকাও /meksɪkoo:/). tinctive phonemes of these languages, which may
The issue with morphological suffixes may cre- not be common in standard Bangla, are spoken
by native speakers. Due to their frequent usage, nounced it as, /faɪər/ where the diphthong /aɪ/ glides
these phonemes may not be distinctly differentiated into schwa /ə/ in the second syllable. However, the
from the standard Bangla phonetic inventory. This Bangla language does not have a schwa /ə/ sound
challenges IPA models in accurately recognizing as a result for this English diphthong word native
and transcribing these foreign phonetic elements. Bangla speakers use the existing sound to produce
In our dataset, we have a significant number of the loaned word as /fɐʲe.ɐɾ/ which does not have a
English and Arabic words. To transcribe these diphthong in the adaptive form.
words, we consider how native Bangla speakers, The pronunciation of words by Bangla speakers
adhering to the standard Bangla form, would pro- can vary based on regional accents and specific
nounce them. Since standard Bangla users often contexts. Even a standard native speaker may pro-
employ a more received pronunciation when utter- nounce certain words differently depending on the
ing these words, we have annotated them accord- situation, which could lead to variations in IPA tran-
ingly. Hence, we have used /z/, /f/, /v/ /s/ phonemes scription. Unless the transcription is based on au-
for the letters জ/য, ফ, ভ, শ/স respectively. These dio data, ensuring accurate contextual transcription
sounds are not commonly present in the native can be a challenge.
Bangla language, but to transcribe the borrowed
foreign words, we have employed these. Some 2.7.5. Transcribing Numbers
examples are In the dataset, there are numbers represented in
�ফইক (feɪk), িশিডউল (ʃɪ.dɪ.ul), �মাস্তািফজ (most ̪ɐfɪz), যার- various forms. A combination of letters and num-
হাদ (zɐɾhɐd̪), ফজর (fɔzoɾ), রি�েগজ (ɾɔd̪ɾɪgez) bers ("19টা" 19tɐ, "১ম" 1m) or only a combination of
numbers such as "১৯৮৯", "১০০০", or in the context of
2.7.4. English Diphthong and Triphthong in phone numbers and house numbers, were present.
Bangla Adaptive Form To transcribe these, we followed an IPA transcrip-
In English words with diphthongs, the presence tion based on how we naturally pronounce them.
of schwa/ə/ can influence the pronunciation. It ap- For instance, "২০৬" is transcribed as "d̪uɪ ̯ʃo cʰoe̯".
pears in unstressed syllables, usually containing When numbers are pronounced individually, they
the neutral, unstressed vowel sound. This leads are transcribed accordingly, for example, "২০৫০"
to subtle variations in how diphthongs are articu- as "d̪uɪ ̯ ʃunno pãc ʃunno".
lated. For example, ‘power’- in the word, the diph-
thong /aʊ/ is followed by the schwa sound in the 2.7.6. Handling the cases of Abbreviations
unstressed syllable. Or for the word ‘water’, the and Acronyms
first syllable may be reduced to a schwa sound, To ensure dataset accuracy and disambiguate be-
especially if it’s unstressed. It might sound like tween abbreviations and acronyms, we established
”wuh-ter.” However, when these words are adapted a specific protocol. When transcribing an abbrevi-
by the Bangla speaker they will be pronounced like ation like "ম."/M/, we consider the context to iden-
/pa.o̯ʷ͡aɾ/ /o͡ɐ.̯ teɾ/. tify their full forms, which in this case were "মহা-
Bangla speakers adopt English diphthongs that do ম্মদ" /Mɔhɐmmɔd̪/. We then proceeded to transcribe
not contain schwa and the pronunciation tends to the entire words. In the case of acronyms like "মূ -
align with the native English pronunciation. For সক" /muʃɔk, we applied IPA notation for accurate
example, ’high’ is transcribed in the Bangla as /hɐ͡ɪ ̯/, representation. Handling these types of transcrip-
boil as /bɔ͡ɪ ̯l/, and time as /tɐ͡ɪ ̯m/. tions poses certain challenges. Sometimes মহাম্মদ
The English language contains triphthongs, which /Mɔhɐmmɔd̪/ might be spelled and pronounced as
is a rare case in the Bangla language. In the মহাম্মাদ /Mɔhɐmmɐd̪/ or only স. is only given in a sen-
case of English triphthongs, native Bangla speak- tence and the transcriber has to assume the words
ers tend to avoid pronouncing the word as a triph- if a proper indication is not given in the sentence.
thong. Instead, they convert it into a diphthong So with a large number of acronyms and abbrevi-
and therefore avoid pronouncing the triphthong ations in a language, the transcription of IPA for
word. For example, in English, the word ‘fire’ is pro- these may produce incorrect transcriptions. Some
nounced as /fʌɪə/, which in Bangla is transcribed examples are
as /fɐʲe͡ .̯ ɐɾ/. Cases like these are found in these এসএসিস (esessɪ), িপিডিড (pɪdɪdɪ), মূ সক (muʃɔk)
words as well - ‘hour’ /aər/, which is pronounced Some abbreviation examples are given below in
as /ɐ.oʷ͡ɐɾ̯ /, ‘prayer’ /preɪər/, pronounced as /pɾe.ɐɾ/, (Abbreviation, Bangla Word (IPA)) template,
‘pure’ /pjʊr/ pronounced as /pɪo̯͡ ɾ/. (ম., মহাম্মদ (mɔhɐmmɔd̪)), (�মা., �মাহাম্মদ (mohɐmmɔd̪)),
Hence the only concern while transcribing these (ডা., ডাক্তার (dɐktɐɾ))
words is how a native speaker pronounces them.
Some examples are ফায়ার (fɐʲe.ɐɾ), ফাইনাল (fɐ͡ɪ ̯.nal), 2.7.7. Orthographic Challenges
শুটআউেট (ʃut.ɐu͡ t̯ eː) Bangla orthography may not always align perfectly
In the first example, /fɐʲe.ɐɾ/ is transcribed for the with phonetic transcription, requiring careful inter-
English word ’fire’. The native English speaker pro- pretation. Our dataset has been curated from writ-
ten texts, based on the specific annotator’s pronun- variant of T5 that was pre-trained on a new Com-
ciation intuition, as pronunciation sometimes varies mon Crawl-based dataset covering 101 languages.
from individual to individual. In spite of this, the The model was trained for 10 epochs and a 3e-4
pronunciation of a word might match word to word learning rate. Our model obtained a WER of 0.1
in the IPA transcription. Such as হ্রাসমান /ɾɐʃmɐn/, on the test dataset.
the হ letter here is not pronounced the way it is While evaluating the network, we have chosen
pronounced in the word হলু দ /holud̪/. Also in the Word Error Rate(WER) as a metric, to capture the
spelling of the word হলু দ/holud̪/, there is not any ‘ও’ sentence-level overall performance of the IPA tran-
visible but while articulating the word an /o/ sound scription network. The obtained high score can
has been produced and that’s how the word has be attributed to having a smaller number of homo-
been transcribed. graphs and OOV cases where the words from the
inferences dataset are familiar to the network.
2.7.8. Placement of Diacritics
IPA transcription involves a meticulous and time- 5. Conclusion and Future Work
consuming manual process. Accurate placement
of diacritics and special characters is critical for In this work, we presented a comprehensive study
correctly representing sounds. For instance, if we of the IPA standard of Bangla and discussed all
were to transcribe the Bengali word �দােয়ল as /doel/ the existing points of debate in the literature. We
or /d̪o͡el̯ /, rather than /d̪oʲel/, it would lead to an in- propose a consistent IPA transcription framework
accurate pronunciation. for Bangla texts and discuss the nuances in detail.
We also present a novel 150k sentence dataset for
sequence-to-sequence NLP modeling. This work
3. DUAL-IPA Dataset has the potential to contribute to the field of linguis-
3.1. Dataset Construction tic theory, NLP dataset creation(the first large-scale
sentence-level dataset for Bangla), and also facili-
Following the proposed IPA framework, we con-
tating LLM downstream tasks.
structed the DUAL-IPA dataset, containing 150k
Bangla sentences along with their linguist-validated
IPA transcription. We collected the sentences from References
two sources: Bangla online newspapers(33%) and Zeenat Imtiaz Ali. 2001. Dhanibijnaner bhumika
literature/books(66%). The sentences have been (introduction to linguistics).
equally distributed among 4 linguists with a gradu-
ate degree in linguistics, along with the above IPA International Phonetic Association. 1999. Hand-
transcription protocol. An independent evaluator book of the International Phonetic Association:
has meticulously evaluated all the data to ensure A guide to the use of the International Phonetic
consistency and correctness of annotation. It took Alphabet. Cambridge University Press.
a month for the curation of the dataset. The anno-
Suniti Kumar Chatterji. 1921. Bengali phonetics.
tation process was expedited using i) Preannota-
Bulletin of the School of Oriental and African
tion: A rule (and later, a weak model)-based noisy
Studies, 2(1):1–25.
pre-annotation. ii) Validation: Word(whitespace
separated tokens)-level transcription correction iii) Charles A Ferguson and Munier Chowdhury.
Mapping the word level transcription with the sen- 1960. The phonemes of bengali. Language,
tences. iv) Sentence level validation to fix the 36(1):22–59.
transcription for fixing the homograph cases, nu-
merals, and alignment errors. Abdul Hai. 1964. Dhwonibijnan O Bangla Dhwoni-
tottwo, 3rd edition. Bornomichil.
3.2. Dataset statistics (EDA)
Daniul Huq. 2002. Bhasha bigganer katha (facts
The dataset contains 150k sentences, with an aver-
about linguistics). Dhaka Mowla Brothers.
age of The train split contains 100k sentences and
the test split contains 50k sentences. There are Daniel Jones. 1922. An outline of English phonetics.
about 130k unique words in the training data and BG Teubner.
35k out of vocabulary words in the test dataset.
Peter Ladefoged and Keith Johnson. 2014. A
4. Benchmarking course in phonetics. Cengage learning.
We trained a simple LLM-based seq2seq model for Abul Kalam Manzur Morshed. 1997. Adhunik
benchmarking IPA transcription for Bengali using Bhashatatwa, 2nd edition. Noya Udyog.
the proposed Dual-IPA dataset. Here we used the
’small’ variant of the MT5 model from Google (Xue Sukumar Sen. 1993. Bhasar Itibritta. Ananda Pub-
et al., 2020) for benchmarking. It is a multilingual lishers Private Limited.
Linting Xue, Noah Constant, Adam Roberts, Mi-
hir Kale, Rami Al-Rfou, Aditya Siddhant, Aditya
Barua, and Colin Raffel. 2020. mt5: A massively
multilingual pre-trained text-to-text transformer.
arXiv preprint arXiv:2010.11934.