Hindi Intent Classification
Aditi Jha (19BEC002)
Batchu Satvick (19BEC009)
Sahana VA (19BEC037)
Pothu Lalitanjali (19BCS087)
Supervisor
Dr. Deepak K T
Department of Electronics and Communication Engineering
Indian Institute of Information Technology Dharwad
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 1 / 12
Overview
Overview
Introduction
Research Problem Formulation
Literature Review
Methodology
Dataset
Conclusion
References
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 2 / 12
Introduction
Introduction
Intent Classification
• Natural Language Processing (NLP)
• Natural Language Understanding (NLU)
• Intent vs Entity
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 3 / 12
Research problem formulation
Research problem formulation
• This project aims to classify Hindi sentences based on intent
using Machine Learning for healthcare, geriatric care and home
assistance purposes.
• State of the art results are expected to be attained using the
novel data-set containing nearly 9300 sentences classified into 3
domains and 94 intents.
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 4 / 12
Literature Review
Literature Review
Hwang et al., (2021) proposed a social robot system in the context of
hospital receptionists using Recurrent Neural Networks [1]
Larson et al., (2019) classified around 23,000 sentences over 150 intents
using BERT. BERT yields the best in-scope accuracy [2]
Xia et al., (2018) employed zero-shot user intent detection via capsule neural
networks [3]
Jay Alammar (2018) in his article,The illustarted BERT,ElMo,and co, gave a
detailed information on BERT [4]
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 5 / 12
Methodology
Methodology
• Reviewing standard data sets and building a new Hindi data set for intent
classification.
• Applying machine learning techniques to classify the intent and compare it
with the standard data sets.
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 6 / 12
Dataset
Dataset
The dataset we introduced has 3 domains with 94 intents, the data set was
created by brainstorming with the team members and also sourced from the
existing data set on the internet:
1.Healthcare (with 15 intents)
2.Geriatric Care (with 15 intents)
3.Home Assistance (with 64 intents)
In addition to above 3 domains we have also included out-of-scope as another
domain.
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 7 / 12
Dataset
Dataset
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 8 / 12
Future Work
Future Work
• Increase the size of Dataset
• Reviewing the data set for corrections.
• Developing Intent classification system for Hindi data set
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 9 / 12
Conclusion
Conclusion
To summarize, the team developed a novel dataset for Hindi Intent
Classification that aims to classify Hindi Sentences across 3 different domains
with 94 different intents of the same which also includes out-of-scope
sentences.
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 10 / 12
References
References
[1] Hwang, E. J., Ahn, B. K., Macdonald, B. A., Ahn, H. S. (2020, May).
Demonstration of hospital receptionist robot with extended hybrid code network
to select responses and gestures. In 2020 IEEE international conference on
robotics and automation (ICRA) (pp. 8013-8018). IEEE.
[2] Larson, S., Mahendran, A., Peper, J. J., Clarke, C., Lee, A., Hill, P., ... Mars,
J. (2019). An evaluation dataset for intent classification and out-of-scope
prediction. arXiv preprint arXiv:1909.02027.
[3] Xia, C., Zhang, C., Yan, X., Chang, Y., Yu, P. S. (2018). Zero-shot user
intent detection via capsule neural networks. arXiv preprint arXiv:1809.00385.
[4] Alammar, J. (n.d.). The Illustrated BERT, ELMo, and co. (How NLP Cracked
Transfer Learning). https://jalammar.github.io/illustrated-bert/
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 11 / 12
Thank you
Thank you! Your suggestions are
welcome.
(IIIT Dharwad) Hindi Intent Classification November 14, 2022 12 / 12