70a71-ing-pesquisa-comput 9/6/04 5:53 PM Page 70
TECHNOLOGY
ILLUSTRATION BY NEGREIROS
SOFTWARE
The computer’s voice
Linguists and engineers from
Unicamp formulate a speech system
with a Brazilian accent
T ÂNIA M ARQUES
Published in May 2003
I
f a good number of machines now A joint project, started in 1991, between the lin-
“speaks” sufficiently well for carrying guistic and electrical engineering areas pro-
out simple tasks, and many people have duced software that today is capable of reading
for some years been “talking” to automa- aloud any text written in Portuguese, without
ted telephone answering systems and the characteristic English accent of the sys-
automatic teller machines, the synthetic voice tems produced outside Brazil. The Brazilian
resources in commercial use still show some dif- program bears the name of Aiuruetê, which
ficulties in reproducing human speech with means “true parrot”in the Tupy (the most com-
naturality. And their vocabulary is very limi- mon Brazilian indigenous language).
ted. But there are indications that computers Right from the beginning, the development
will soon be losing their digital accent and ex- of the system has been subordinated to scienti-
pand their linguistic universe. Big companies fic ends, but the project has also produced some
are starting to get results that are more natu- technological results. “We wanted to create a
ral and agreeable to the ears. This quest for per- speech synthesis system in Brazilian Portugue-
fection in sounds from computers began early se, starting with basic research and focused on
at the State University of Campinas (Unicamp). it”, recalls Professor Eleonora Cavalcante Al-
70 ■
PESQUISA FAPESP ■
SPECIAL ISSUE DEC 2002 / FEB 2004
70a71-ing-pesquisa-comput 9/6/04 5:54 PM Page 71
bano, from the Phonetics and Psycho- tain point does the spelling of words traditional concept in linguistics that
linguistic Laboratory of the Language determine their pronunciation. En- defines the phoneme as the smallest
Studies Institute (Lafape/IEL), who is co- glish, for example, has an orthography mental unit corresponding to sound.
ordinating the work. Maintaining the that is far from being phonetic. Words Since the start of the work, the team
original target and with a broad vision spelt in a different way, such as rite, wri- has maintained the theoretical positi-
of the phonetic-acoustic description of te, right and wright are pronounced exac- on according to which the phoneme is
language, the venture included studies tly the same way and have, therefore, the an abstraction influenced by alphabe-
of problems of articulatory development same phonetic transcription: RIT. The tical writing. One of the points of the
and disturbances, phonological theory, orthography of Portuguese has medium study was of the various phonemes
phonostylistics and the analysis and phoneticity, but even so does not offer that undergo an influence from those
synthesis of speech. fewer difficulties. To stay with just one that precede them and from those that
example, suffice it to remember that the follow them. “Many factors are combi-
Swift evolution - In 1992, Professor Fá- letter “x” may have the sound of “sh”, “s”, ned in the articulation of sounds, and
bio Violaro, the coordinator of the Di- “ks” or “z”.“Portuguese is nice, but Spa- a ‘p’ followed by an ‘a’ is pronounced
gital Speech Processing Laboratory of the nish is much better”, Eleonora jokes. differently from ‘p’ followed by an ‘i’ or
A
Faculty of Electrical Engineering (LPDF/ a ‘u’”, Eleonora observes.
Feec) and his group of researchers em- ddressing the question, a lay- Another problem in developing a
braced Lafape’s project.“We were already man can imagine that the speech system is the differences between
working with speech synthesis, but the construction of a database the graphic representations of the text
results of our efforts were limited, pre- with all the words of the and the way they are expressed in spe-
cisely for the lack of linguistic know- language is the solution. ech. Abbreviations, for example, can be
ledge”, says Violaro. At the time, personal But an enterprise of this kind, besides read differently, even when they have
computers were evolving apace, and their being monumental, would be fated to the same number of characters and are
resources for processing and memory failure: language is dynamic, and new equally pronounceable. In this regard,
were already making it possible to de- words arise every day. Furthermore, the it is worth comparing USA with NASA,
velop voice synthesis programs. Today, pronunciation of one and the same word for example. Reading a telephone num-
Aiuruetê runs on any computer with a varies in accordance with the context, ber is different from a numerical ex-
Windows operating system. which would imply the need for recor- pression – nobody would read 32220000
Speech synthesis programs, which ding the same word several times – there as 32 million, two hundred and twenty
can make a big contribution to distance simply could not be any dictionary of thousand. In Portuguese, measure-
learning and to the education of the such a size.Even words that are widely used ments of length are written in the same
visually impaired, besides a series of com- may not be in any dictionary, as well as way in the singular and in the plural:
mercial applications, are usually based the verbal inflections and the diminutive 1 meter and 100 m. All this calls for
on the conversion from text to speech. and superlative forms. What software complex algorithms.
Like similar foreign software, Aiuruetê chiefly needs is parameters to guide the
works with textual information, which, pronunciation by the machine. Emotion and subtleties - “Although it
in the preprocessing stage, is submitted “We opted for limiting ourselves to can already be used in a series of ap-
to an analysis, to include the gramma- some 2,500 excerpts from recordings”, plications, Aiuruetê is still under de-
tical characteristics (acronyms, abbre- says Eleonora. The number is not a velopment”, explains Violaro. Among
viations and graphic symbols) and re- very high one, but the excerpts were the improvements, there is the assimi-
written in full in the way it is read. submitted to a strict selection. In it, lation of the subtleties of the rhythms
Afterwards, it undergoes a phonetic the researchers did not work with a of Brazilian speech. “In future, we want
transcription. Then the software looks in Aiuruetê to express even the tonal dif-
its database for utterances compatible ferentials of emotion”, says Eleonora.
with the transcribed material and takes THE PROJECT According to Violaro, the program is
care of stringing together the phonetic Processing Text and Acoustic beginning to arouse the interest of some
elements that make up the words, also Signals in Brazilian Portuguese: companies that are specialized in in-
giving them information on the intona- A Linguistic – Engineering formation technology. One of them was
tion and rhythm of Brazilian Portugue- Interface for the Science to use Aiuruetê in a self-service sys-
se. Does it seem easy? Well, it isn’t – so and Technology of Speech tem aimed at medical clinics, with the
much so that since the beginning of the booking of appointments and other
MODALITY
so-called digital ages speech synthesis Thematic project functional features. Furthermore, the
has been a challenge to researchers from work will also result in building up a
all over the world, who have attained a COORDINATOR public database of knowledge of the
level that is no more than reasonable. ELEONORA CAVALCANTE ALBANO – phonic aspects of the Portuguese spo-
Several factors contribute towards the Language Studies Institute at ken in Brazil. The software is there-
Unicamp
complexity of the process, in any lan- fore getting closer to one of the most
guage. In the first place, systems written INVESTMENT appreciated properties of true par-
for different languages have varied de- R$ 9,528.00 and US$ 58,672.00 rots: being the most talkative of the
grees of phoneticity – only up to a cer- Psittacidae family. •
PESQUISA FAPESP ■
SPECIAL ISSUE DEC 2002 / FEB 2004 ■
71