0% found this document useful (0 votes)

72 views35 pages

Interactive Telecommunication Systems and Services: Ing. Stanislav Ondáš, PHD

Interactive Telecommunication Systems and Services discusses sound, speech, and their computer processing. It covers models of speech production and perception, as well as digital speech processing steps and applications. The document also introduces automatic speech recognition, natural language understanding, dialogue management, text-to-speech synthesis, and multimodality in human-machine interfaces. The goal is to enable more intuitive and natural communication between humans and machines through multiple modalities like speech, gestures, and touchscreens.

Uploaded by

Darkeye18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

72 views35 pages

Interactive Telecommunication Systems and Services: Ing. Stanislav Ondáš, PHD

Uploaded by

Darkeye18

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 35

Interactive Telecommunication

Systems and Services

Ing. Stanislav Ondáš, PhD.

Sound, speech and their computer processing
“Life depends on the ability to communicate, to share information”
(Uhlíř J. a kol.: Technológie hlasových komunikací)
Sound

• The sound is an mechanical movement, which

– Arises in a flexible environment turning the particle of the environment from
its equilibrium position
– It spreads in the environment in the form of sound waves (the energy is
passed by oscillation between neighboring particles)
– is perceptible to humans, animals and other living creatures, or detectable by
special instruments

• It is often called sound (acoustic) waves.

Speech

• One of the most natural communication medium

• Speech is the spoken form of language
– Language relates to thinking; it enables to share ideas, opinions,
emotions, desires, intentions, …
• Speech can substitute other senses in information
transmission process.
• Speech contains some verbal content and also non-verbal
signals (about emotions, mental state and so on.)
• Speech enables to identify speaker (speaker verification)
• Computer models are often inspirated by human speech
production and hearing.
Models of speech production and perception
Digital speech processing steps
Digital speech processing applications
Speech technologies
• Enables to use speech in human-computer interaction
• Human-human communication use following human skills:
– Hearing ability
– Language understanding ability
– Ability to be participant in dialogue interaction
– Ability to generate an appropriate reaction – answer (it requere thinking
…)
– Speech production ability
– Ability to use other senses in communication - Schopnosť použiť aj
ostatné zmysly – sight, smell, taste, touch

• Scientists want to give such abilities also machines to make communication with
machines more intuitive and natural for humans.
Human – machine speech communication chain
Systems classification
• Spoken Dialogue Systems (SDS)
– Automatic interactive systems, which enables human-computer
interaction through the spoken dialogue.
– It can be seen as the Voice User Interface (VUI)
– Task-oriented systems.
• Multimodal dialogue systems (MDS)
– Multimodal interactive system is a system that enables to use at least
two different modalities in the human-computer interaction, e.g.
combination of speech, gestures and touches on touchscreen…”
• Embodied Conversational Agents
– Animated virtual avatars with „human-like“ behaviour as
an interface to computer systems
• Robots and humanoid robots
Text-to-speech synthesis (TTS)

Automatic Speech Recognition (ASR)

Natural language processing (NLP)

Dialogue/interaction management

Speech technologies for HMI

Information obtaining, processing and representation

Modalities interpretation, fusion and fission

Automatic speech recognition
Introduction
Automatic Speech Recognition

• Transforms speech signal to the words sequence

• Makes computers able to hear
• The main principle: acoustic matching of incoming speech with
patterns in memory..
• ASR technology is language-dependent.. For each language a
unique set of resources (databases, models) is required.
• Applications: automatic telephony services, voice search, smart
home apps, robotics, speaker identification, security apps,
healtcare, …)
ASR classification
• According complexity
– Isolated words recognition
– Connected words recognition
– Dictated speech recognition
– Spontaneous natural speech recognition (Large Vocabulary Continuous
Speech Recognition systems – LVCSR)
• According speaker dependency
– Speaker-dependent – rozumejú iba jednému človeku, resp. úzkej skupine ľudí
– Speaker-independent – natrénované tak, aby rozumeli hocikomu
• According vocabulary size
– ASR with small vocabulary: 1 – 10tky slov
– ASR with medium vocabulary: 100 – 1000ky slov
– ASR with large vocabulary: 10 tis. a viac slov
• According principle
– Based on Dynamic Time Warping method (DTW)
– Based on Hidden Markov Models (HMM)
– Based od DNN (Deep Neural Networks)
– Hybrid systems
Principal scheme of ASR system
Natural language
UNDERSTANDING
Intro
Natural Language Understanding
• Transforms incoming word sequence to some meaning representation.
• Spoken sentences bear an surface representation of meaning, but meaning is
more structured (hierarchic)
• The simpler approach: keyword spotting
• Key words, which relates with key information, are identified in sentences
• More complex approach: linguistic analysis
• Preprocessing: tokenization, lemmatization
• Morphological analysis
• Syntactic analysis
• Semantic analysis
• Pragmatic analysis
• NLU requires database with information about entities and relations in real
world.
• Task-oriented systems enable simplifications in a form of “domain models”.
• In case of robotic interface, the task is to find mapping between spoken
commands and robot’s functions
Dialogue management
Spoken dialogue
• Dialog – natural medium for information exchange between two or more
participants
• Enables to cooperate on task solving in spoken interaction
• In SDSs we suppose the “task-oriented” dialog. In comparison with
“conversation”, it is more simpler to manage such dialogues.
• V SDS je za dialógové funkcie zodpovedný blok riadenia dialógu, teda
dialógový manažér.
• Terms:
– Dialog turn
– Participant
– System prompt/utterance
– User’s utterance/answer
Dialogue manager (DM)

• The main task of DM is to find an appropriate reaction on users input,

which reflects actual state of the system, user inputs and interaction
history.
Dialogue management
• Two important problems:
– Dialogue management methods. How to control the flow of the dialogue,
turns and logic.

– Dialogue models. They represent and model:

• information that DM uses for user’s input interpretation and flow
management.
• Model of the user
• Model of the system
• Interaction history
DM systems classification
• Finite state machines
– Models dialogue as a network of states and transitions between them. Each state
represent particular step in the dialog. Transitions represent possible (conditional)
moves to another step.
• Frame-based systems
– The main idea is that the dialogue is like a form filling task.
• Agent systems
– Dialog is managed by a set of specialized agents (domain agent, dialogue flow
agent, history agent, …)

• DDL (Dialogue Description Language)-based systems

– The dialogue and the control algorithm are encoded in one of the scripting
language. The most popular is the VoiceXML language
• Systems based on statistical modeling
– Control mechanisms are trained on large corpuses of annotated dialog
– The result is the statistical model of the dialog
Metacommunication in dialog
Clarification
• The part of dialog, where obtained information is clarified by
dialogue participants

• Example:
System: What time do you want to leave from Prague?
User: At five?
S: In the afternoon? (clarification prompt)
Confirmation
• Implicit - Confirmation is included into the prompt, which has also another
communication function.

S: From where you want to travel?

U: From Žiliny.
S: What time do you want to leave from Žilina?
U: scenario a) At five. (Implicit confirmation)
U: scenario b) Not from Žiliny, from Žipov. (Missunderstanding repair)

• Explicit – Confirmation is performed by the special prompt, which has only one
communication function – to confirm previously obtained information.

S: From where you want to travel?

U: From Žiliny.
S: What time do you want to leave from?
U: At five
S: You selected Žilina station and departure time about 5:00AM. Is it correct? (explicit
confirmation)
Errors and misunderstanding repairs
• Type of errors
– Nomatch event means the situation, when the input
provided by the user does not match any acceptable
value.
– Noinput event means the situation, when the user does
not answer the system prompt in the specified interval
– Misunderstanding event. The system was not able to
recognize the user input correctly.
– Errors on the side of the user (hesitations, ...)
– Errors on the side of the system (system crash, ...)
Text-to-Speech synthesis
Text-to-Speech synthesis (TTS)
• TTS converts text to speech.

• Applications: apps for the blind people, telephony apps – call centers,
voice banking, ... , robotics, in-car navigation, smart home apps,
intelligent glasses, ...

• Interdisciplinarity of the TTS problematics: signal procesing, natural

lajnguage processing, phonetics, databases

• TTS has to able: to model prosody (melody, tempo, rythm, emphasis) ,

lexical analysis, ....

• Classification:
– Concatenative approaches – diphone-based/corpus based..

– Statistical approaches based on Vocoders and HMM models.

TTS system structure
Multimodality in HMI
Multimodality in HMI
• Human-human communication involves all senses
• Such capability we want to give also to machines (computers, robots, .. )
• Required capabilities:
• Recognition of inputs delivered from various modalities (visual, speech, touches, ..)
• Input signals interpretation and meaning extraction
• Multimodal fusion to one complex representation of meaning
• Multimodal fission, which enables machines to present information through several
output modalities
• The ability to model user behaviour (desires, intentions, goals, emotions)
• The ability to model their own behavior and internal state (desires, intentions, goals,
emotions)
• The ability to work with database of real-world data

• Typical input modalities • Typical output modalities

• Speech • Speech
• Gestures (hand and head • Graphic (text, maps)
movements, face gestures) • Gestures (hand and head
• Touches on touchscreen movements, face gestures,
• Writing articulation)
• Using keys on keyboard • System actions
• Joystick
• Virtual avatars
• Human –like behavior
• Aplications:
• Information kiosks
• Recepcions
• Education
• Application for elderly and disabled people for their
„independent living“

Virtual conversational agents and

humanoid robots
http://www.youtube.com/watch?v=zruOPSSWVXw

http://www.youtube.com/watch?v=munqOlj3mNw

http://www.youtube.com/watch?v=xRR33WDFi_k

http://www.youtube.com/watch?v=cy7xGwYdRk0

http://www.youtube.com/watch?v=nFZ9sUbbfe8

http://www.youtube.com/watch?v=wOzw71j4b78
Applications and demos
designed in our Lab
Automatický prepis videomateriálu

Diktovací systém pre slovenské súdy

Rečové rozhranie pre servisného robota SCORPIO

Inteligentné rečové komunikačné rozhranie

Virtuálny agent SIMONA

HMM syntéza reči v slovenskom jazyku

Objectives A Work in Progress
No ratings yet
Objectives A Work in Progress
13 pages
Computer Based Automatic Speech Processing: Pham Van Tuan
No ratings yet
Computer Based Automatic Speech Processing: Pham Van Tuan
70 pages
A Comparative Study of Various Approaches For Dialogue Management
No ratings yet
A Comparative Study of Various Approaches For Dialogue Management
8 pages
VUIs and Mobile Applications
No ratings yet
VUIs and Mobile Applications
9 pages
Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation
No ratings yet
Handbook of Multimodal and Spoken Dialogue Systems: Resources, Terminology and Product Evaluation
535 pages
Speech Recognition Seminar
No ratings yet
Speech Recognition Seminar
19 pages
Analysis of Spoken Dialog Systems: A Project Report On
No ratings yet
Analysis of Spoken Dialog Systems: A Project Report On
23 pages
Unit 1: Overview of The Field
No ratings yet
Unit 1: Overview of The Field
78 pages
Natural Language Processing: by Dr. Parminder Kaur
No ratings yet
Natural Language Processing: by Dr. Parminder Kaur
26 pages
Embedded Speech Recognition: State-Of-Art & Current Challenges
No ratings yet
Embedded Speech Recognition: State-Of-Art & Current Challenges
36 pages
Speech Recognition Full Report
No ratings yet
Speech Recognition Full Report
11 pages
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
No ratings yet
Advanced Topics in Speech Processing (IT60116) : K Sreenivasa Rao School of Information Technology IIT Kharagpur
17 pages
224s 22 Lec3
No ratings yet
224s 22 Lec3
56 pages
8.5 Multilingual Speech Processing
No ratings yet
8.5 Multilingual Speech Processing
24 pages
Speech Recognition System - A Review: April 2016
No ratings yet
Speech Recognition System - A Review: April 2016
10 pages
Rudnicky99 Eurospeech
No ratings yet
Rudnicky99 Eurospeech
4 pages
Lecture 04
No ratings yet
Lecture 04
543 pages
Week 4-Dialog Management and System Evaluation
No ratings yet
Week 4-Dialog Management and System Evaluation
74 pages
Ravenclaw The Joakj
No ratings yet
Ravenclaw The Joakj
30 pages
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
No ratings yet
Artificial Intelligence-An Introduction: Department of Computer Science & Engineering
17 pages
Cortana
No ratings yet
Cortana
5 pages
Telecommunication Applications of Speech Recognition
No ratings yet
Telecommunication Applications of Speech Recognition
100 pages
Voice Technology Seminar
100% (1)
Voice Technology Seminar
35 pages
Personal Assistant Chatbot
No ratings yet
Personal Assistant Chatbot
5 pages
1.6 - Components of A Conversational AI System
No ratings yet
1.6 - Components of A Conversational AI System
6 pages
UNIT 5 Application AI
No ratings yet
UNIT 5 Application AI
16 pages
Speech Technology
No ratings yet
Speech Technology
5 pages
Speech Recognition System
No ratings yet
Speech Recognition System
5 pages
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
No ratings yet
(IJCST-V4I2P62) :Dr.V.Ajantha Devi, Ms.V.Suganya
6 pages
Speech Recognition Technology
No ratings yet
Speech Recognition Technology
23 pages
Lecture 22
No ratings yet
Lecture 22
51 pages
Speech Recognition Applications TEXT
No ratings yet
Speech Recognition Applications TEXT
7 pages
Speech Recognition System
No ratings yet
Speech Recognition System
12 pages
Speech Recognition System - A Review
No ratings yet
Speech Recognition System - A Review
10 pages
AI Notes Unit 3
No ratings yet
AI Notes Unit 3
10 pages
Speech Recognition in AI (COMP 334)
No ratings yet
Speech Recognition in AI (COMP 334)
26 pages
Speech Recognition for Researchers
No ratings yet
Speech Recognition for Researchers
18 pages
Recent Trends in Discourse and Dialogue
No ratings yet
Recent Trends in Discourse and Dialogue
338 pages
Survey on Speech Recognition Systems
No ratings yet
Survey on Speech Recognition Systems
2 pages
Speech Recognition: BY Charu Joshi
100% (2)
Speech Recognition: BY Charu Joshi
26 pages
Rifl 2009
No ratings yet
Rifl 2009
15 pages
Human Computer Interaction Solved Paper
No ratings yet
Human Computer Interaction Solved Paper
45 pages
ASR Insights for NLP Students
No ratings yet
ASR Insights for NLP Students
22 pages
Artificial Intelligence: Presented By: A.Sowmya CH - Sushma
No ratings yet
Artificial Intelligence: Presented By: A.Sowmya CH - Sushma
10 pages
Conversational System Framework Review
No ratings yet
Conversational System Framework Review
12 pages
Tan Pan Hassan VoiceRecognition
No ratings yet
Tan Pan Hassan VoiceRecognition
21 pages
Automated Speech Recognition Systems Applications in Industry
No ratings yet
Automated Speech Recognition Systems Applications in Industry
4 pages
A Skill Based Evaluation Report: Submitted by Joy James Swamy (Urk23Cs1042)
No ratings yet
A Skill Based Evaluation Report: Submitted by Joy James Swamy (Urk23Cs1042)
16 pages
AI Report (Karthi)
No ratings yet
AI Report (Karthi)
15 pages
Speech Recognitionppt 1
No ratings yet
Speech Recognitionppt 1
22 pages
AI Dialogue Management Research
No ratings yet
AI Dialogue Management Research
19 pages
Current Challenges and Application of Speech Recog
No ratings yet
Current Challenges and Application of Speech Recog
4 pages
Voice Recognition & Text-to-Speech
No ratings yet
Voice Recognition & Text-to-Speech
6 pages
Introduction To Linguistics 14
No ratings yet
Introduction To Linguistics 14
27 pages
5 Interfaces
No ratings yet
5 Interfaces
50 pages
SNLP Assignment
No ratings yet
SNLP Assignment
7 pages
Naturalizing Computer Science
No ratings yet
Naturalizing Computer Science
8 pages
Girlfriend Ki Help Se Uski Sisters or Apni Sisters Ko Choda
65% (139)
Girlfriend Ki Help Se Uski Sisters or Apni Sisters Ko Choda
603 pages
Ielts Academic Reading Test Tutorial
No ratings yet
Ielts Academic Reading Test Tutorial
3 pages
Gerald Barry Piano Quartet No. 1
No ratings yet
Gerald Barry Piano Quartet No. 1
3 pages
Romance and Chivalry in English Medieval Literature
No ratings yet
Romance and Chivalry in English Medieval Literature
12 pages
Castinglianos Theorem Proof
No ratings yet
Castinglianos Theorem Proof
5 pages
Getting Started With CREATE PLUS
No ratings yet
Getting Started With CREATE PLUS
3 pages
Infinitives - Rule - and - Check - Answer Key
No ratings yet
Infinitives - Rule - and - Check - Answer Key
4 pages
Creating An Object Save Location For The Object Management Workbench - Document 626181.1
No ratings yet
Creating An Object Save Location For The Object Management Workbench - Document 626181.1
6 pages
L3 - Substitution Cipher
No ratings yet
L3 - Substitution Cipher
22 pages
Ruijie RG-S5300-E Series Gigabit 1
No ratings yet
Ruijie RG-S5300-E Series Gigabit 1
16 pages
Ir Mod4 Notes
No ratings yet
Ir Mod4 Notes
19 pages
Drop Box
No ratings yet
Drop Box
59 pages
Hol 2225 02 Net - PDF - en
No ratings yet
Hol 2225 02 Net - PDF - en
262 pages
Lesson Plan Math-3 (Detailed 1)
No ratings yet
Lesson Plan Math-3 (Detailed 1)
10 pages
Window & Active Directory Exploitation Cheat Sheet
No ratings yet
Window & Active Directory Exploitation Cheat Sheet
42 pages
Brazilian Culture and Civilization
No ratings yet
Brazilian Culture and Civilization
8 pages
Electronics Engineer Portfolio
No ratings yet
Electronics Engineer Portfolio
1 page
1.2language Processing Activities
No ratings yet
1.2language Processing Activities
15 pages
Cat-Themed Musical Score
No ratings yet
Cat-Themed Musical Score
9 pages
AI's Impact on Tech and Society
No ratings yet
AI's Impact on Tech and Society
8 pages
42 Plag Report
No ratings yet
42 Plag Report
56 pages
Unit Plan Conrad Sully
No ratings yet
Unit Plan Conrad Sully
84 pages
Lesson Plan - Where Were You at
No ratings yet
Lesson Plan - Where Were You at
6 pages
CHAITYAVANDAN
No ratings yet
CHAITYAVANDAN
4 pages
English Sample Paper 4-1
No ratings yet
English Sample Paper 4-1
7 pages
Gold C1 Advanced NE DF UT02
No ratings yet
Gold C1 Advanced NE DF UT02
2 pages
English Levels Explained for Beginners
No ratings yet
English Levels Explained for Beginners
7 pages
Introduction + Unit 1 Unit 1 (Cont) Unit 1 (Cont) Unit 2 Unit 2 (Cont)
No ratings yet
Introduction + Unit 1 Unit 1 (Cont) Unit 1 (Cont) Unit 2 Unit 2 (Cont)
38 pages
MS Excel Full Notes PDF Free Download - Google Search
No ratings yet
MS Excel Full Notes PDF Free Download - Google Search
3 pages

Interactive Telecommunication Systems and Services: Ing. Stanislav Ondáš, PHD

Uploaded by

Interactive Telecommunication Systems and Services: Ing. Stanislav Ondáš, PHD

Uploaded by

Interactive Telecommunication

Systems and Services

Ing. Stanislav Ondáš, PhD.

• The sound is an mechanical movement, which

• It is often called sound (acoustic) waves.

• One of the most natural communication medium

Automatic Speech Recognition (ASR)

Natural language processing (NLP)

Speech technologies for HMI

Modalities interpretation, fusion and fission

• Transforms speech signal to the words sequence

• The main task of DM is to find an appropriate reaction on users input,

– Dialogue models. They represent and model:

• DDL (Dialogue Description Language)-based systems

S: From where you want to travel?

S: From where you want to travel?

• Interdisciplinarity of the TTS problematics: signal procesing, natural

• TTS has to able: to model prosody (melody, tempo, rythm, emphasis) ,

– Statistical approaches based on Vocoders and HMM models.

• Typical input modalities • Typical output modalities

Virtual conversational agents and

Diktovací systém pre slovenské súdy

Rečové rozhranie pre servisného robota SCORPIO

Inteligentné rečové komunikačné rozhranie

Virtuálny agent SIMONA

HMM syntéza reči v slovenskom jazyku

You might also like