Організаційне
Навчальний план
Блок 1. Підвалини NLP
● основи структурної лінгвістики
● робота з даними
● повний цикл NLP проєкту
● розв’язок задачі на основі правил
Блок 2. Класичне NLP + 2 гостьові заняття
● текст як мішок слів
● текст як послідовність
● текст як дерево
● текст як граф
Блок 3. Глибинне NLP
● NLP без учителя та розподілені представлення
● звичайні та рекурентні нейромережі для NLP
● моделювання мови, генерація і машинний переклад
● сучасні нейромережні архітектури
Розклад
Заняття
● четвер, 19:30-21:30 (лекція)
● субота, 15:00-18:00 (практичне заняття)
Домашні завдання
● видаються щочетверга
● термін виконання - з суботи до суботи
● зараховуються при умові здачі протягом:
○ 1 тижня - 100%
○ 2 тижнів - 75%
○ 1 місяця - 50%
Диплом
Для отримання диплому треба:
● отримати >= 50% за домашні завдання
● зробити курсовий проєкт і презентувати демо
Стипендії
● одна від Grammarly найкращому студентові чи студентці
● ще одна під питанням
Курсовий проєкт
Зміст
● робота з даними
● побудова метрик для оцінки якості
● побудова базових рішень
● побудова розумного рішення і порівняння результатів зі
state-of-the-art
Два випуски
● внутрішній у Projector (11.06)
● відкритий у Grammarly (13.06)
* Тему курсового проєкту можна вибрати зі списку запропонованих
або придумати самостійно.
Інструментарій
● Slack для спілкування
● GitHub для матеріалів і домашніх завдань
https://github.com/vseloved/prj-nlp-2020
● Будь-яка мова програмування
○ але найлегше буде з Python
● Будь-які бібліотеки для NLP
● Будь-які бібліотеки для машинного навчання
● Зручні інструменти:
○ Jupyter Notebooks
○ Google Colab
● Англійська та українська мови
Intro to Natural Language
Processing
Mariana Romanyshyn, Grammarly, Inc.
Vsevolod Dyomkin, Franz Inc.
Contents
1. About us
2. About you
3. Overview of NLP
4. NLP applications in the real world
8
1. About Us
Seva
Lisp programmer
5+ years of NLP work at Grammarly
Currently work at Franz on AllegroGraph
Occasional writer & speaker
http://lisp-univ-etc.blogspot.com
https://vseloved.github.io
http://twitter.com/vseloved
http://facebook.com/vseloved
10
Programming Experience
Grammarly
Franz
Consulting
Startup projects
Open Source
11
NLP Experience
Grammarly
Franz projects
lang.org.ua
Consulting projects
cl-nlp
12
NLP in agraph
● Building an ML/NLP pipeline inside the graph DB
● Entity extraction
● Various classification projects (sales chats, company
industries, customer types, tweets, etc.)
13
Teaching Experience
7+ years in KPI: SPOS course
3 prj-algo courses
UCU lecturer and supervisor on NLP
3rd prj-nlp course
Various workshops and conference talks
14
Writing
https://leanpub.com/lisphackers
15
Mariana
● Computational linguist
● 9 years in NLP
● Struggling reformer of university syllabuses 💪
● Active conference speaker
AI Ukraine (x6), ODSC London (x2), ODSC Kyiv, DataScienceUA (x2)…
Morning@Lohika, Grammarly AI club, Kharkiv AI club...
Sorry, no FB :)
https://www.linkedin.com/in/mariana-romanyshyn-b5896529/
16
17
NLP Experience
● Grammarly
● Zoral Labs
● Brainglass
● Brown-uk
● Consulting projects
18
Teaching Experience
● Grammarly CompLing Summer School (2018-2020)
● NLP course at ESSCASS (2019)
● Lectures and workshops at Ukrainian universities
KNU, KPI, KhPI, UCU, LPNU, DonUN...
● NLP course at Projector (2018-2020)
19
2. About You
Please introduce yourself
● What is your name?
● What do you do?
● Why are you here?
● Is there a particular NLP task you’re interested in?
● What is your favorite language?
21
3. WTF NLP or NLP FTW?
The Goal of NLP
Goal:
have computers understand natural language in order to
perform useful tasks
How:
transform free-form text into structured data and back
23
Natural Language
● Ambiguous
● Noisy
● Evolving
24
Image from https://nlp.stanford.edu/projects/histwords/
Position of NLP
Computational
linguistics
NLP
Computer Statistics &
Science Machine Learning
25
Expertise in NLP
Linguistics
Rules&Stats Theory
ML
Software DL Research
development
26
An NLP Project
Image from
https://medium.com/@neal_lathia/five-lessons-from-building-machine-learning-systems-d703162846ad
27
A Classic NLP Pipeline
Image from http://blog.aylien.com/ 28
A Modern NLP Pipeline
Image from http://blog.aylien.com/ 29
NLP & AI
NLP & CV, DSP, …
NLP vs NLU
Are we there yet?
- http://nlpprogress.com/
- https://www.eff.org/ai/metrics
4. NLP Applications
in the Real World
Q: What NLP applications do you know?
A: https://github.com/Kyubyong/nlp_tasks
32
Types of NLP Applications
• Linguistic
• Analysis
• Transformation
• Generation
• Multi-modal
33
Linguistically-Motivated NLP Applications
● Segmentation
● Part of speech tagging
● Named-entity recognition
● Syntactic parsing
● Coreference resolution
● Semantic parsing
● Discourse parsing
● ...
34
Analytical Applications
● Whole-text classification
● Segmentation (& classification of parts)
● Extraction of useful data
● Comparative Analysis
● Large-scale data analysis and visualization
35
Analytical NLP Applications
Seva’s experience building classifiers
● Language identification
● Email dissection
● Product catalog with 1k categories
● Identifying subjective statements
● Identifying conversation intents
● Identifying the industry of a client
36
Analytical NLP Applications
Sentiment Analysis
• sentiment scale or classes
• type of emotion
• object of the sentiment
• subjectivity
• manipulation
• sentiment maps
• ...
37
Sentiment maps
Image from http://www.dialogueearth.org/ 38
Targeted Sentiment Analysis
“I won’t give this product a full five bc this
stuff just isn’t good for you. No soda is. But
with that being said Dr Pepper is one of my
favorites compared to coke and Pepsi. Like this
much more than coke. It has like a sweet tang
to it that I like. Also not super duper
carbonated as coke either.”
39
Targeted Sentiment Analysis
“I won’t give this product a full five bc this
stuff just isn’t good for you. No soda is. But
with that being said Dr Pepper is one of my
favorites compared to coke and Pepsi. Like this
much more than coke. It has like a sweet tang
to it that I like. Also not super duper
carbonated as coke either.”
40
Targeted Sentiment Analysis
“I won’t give this product a full five bc this
stuff just isn’t good for you. No soda is. But
with that being said Dr Pepper is one of my
favorites compared to coke and Pepsi. Like this
much more than coke. It has like a sweet tang
to it that I like. Also not super duper
carbonated as coke either.”
41
Analytical NLP Applications
Abusive / Toxic / Insincere / Non-inclusive Language
• Quora: Insincere Questions (2019)
• Jigsaw: Toxic Comments (2018)
• Workshop on Abusive Language Online
(2017-2020)
• Last year - trolling in German news
• This year - a project from Biasless
42
Analytical NLP Applications
Abusive/Toxic/Insincere/Non-inclusive
Language
43
Analytical NLP Applications
Sarcasm / Humor / Irony Detection
44
Cognitive features for sarcasm detection
Image from Mishra A. et al (2016) 45
Memotion Analysis
Last-year’s project
Shared task 2020:
http://www.amitavadas.com/Memotion.html
46
Analytical NLP Applications
Good vs. Evil Characters
47
Phonological features
Good vs. Evil Characters
Image from Papantoniou K. and Konstantopoulos S. (2016) 48
Analytical NLP Applications
Text Grading
● vocabulary
● grammar
Image from 49
Analytical NLP Applications
Text mining
Image from www.restaurantlechristine.com/ 50
Analytical NLP Applications
Text mining
Image from www.restaurantlechristine.com/ 51
Analytical NLP Applications
52
Analytical NLP Applications
Fact Extraction
Image from www.bloomberg.com/ 53
Analytical NLP Applications
Fact Extraction
Image from www.bloomberg.com/ 54
Automated Fact-Checking
Picture © https://www.slideshare.net/isabelleaugenstein/learning-to-read-for-automated-fact-checking 55
Transformational :) NLP Applications
56
Transformational NLP Applications
Machine Translation
57
Transformations in MT
Image from Kyunghyun Cho (2015) 58
Transformational NLP Applications
Error correction
An average non-native
speaker makes one mistake
per every ten words.
Image from www.writing.com/ 59
Transformational NLP Applications
Error correction
Spelling, Grammar, Punctuation
• I cutted your fnger didn’t I?
• I cut your finger, didn’t I?
• In daytime, he stayed in room.
• In the daytime, he stayed in the room.
60
Error Correction at Sciworth
From a Spellchecker to an Error-correction
framework
61
LanguageTool
Image from https://languagetool.org/uk/ 62
Grammarly
Image from www.writing.com/ 63
…
64
Transformational NLP Applications
Paraphrasing
• Joey came racing at a very fast speed.
• Joey came racing at a breakneck speed.
• A very fast train runs through the city of Urumqi.
• A high-speed train runs through the city of Urumqi.
65
Transformational NLP Applications
Text Simplification
● for non-experts
● for children
● for people with aphasia
● for non-natives
66
Transformational NLP Applications
Text Simplification
They are humid, prepossessing
Homo Sapiens with full-sized
aortic pumps.
67
Transformational NLP Applications
Text Simplification
They are humid, prepossessing
Homo Sapiens with full-sized
aortic pumps.
They are warm, nice people
with big hearts.
68
Transformational NLP Applications
Data anonymization (& deanonymization 😈)
Original:
Jack and Jill Robinson bought a car at BimBom Industries for $400K on
May 13th, 2011.
69
Transformational NLP Applications
Data anonymization
Original:
Jack and Jill Robinson bought a car at BimBom Industries for $400K on
May 13th, 2011.
Anonymized:
Boris and Althea Stephanopoulos bought a car at Acme Industries for
€120K on March 21st, 2001.
70
Transformational NLP Applications
Text Summarization
● extractive
● abstractive
71
Transformational NLP Applications
Text Summarization
● extractive
● abstractive
Lots of projects,
not so much value… :(
72
Transformational NLP Applications
Text-to-Data
Destinations:
● Search queries
● Database queries
● Source Code
● Diagrams, blueprints, …
Attendify course project
73
Generative NLP Applications
… btw, there was a recent project on
this:
https://www.reddit.com/r/aimeme/ 74
Generative NLP Applications
Question Answering
● limited domain
● general-purpose
2 course projects
Image from www.ntt-review.jp 75
Types of queries
• Factoid: Who discovered America?
• Yes/No: Is Berlin the capital of Germany?
• Definition: What is leukemia?
• Cause/consequence: Why did the Iraq war start?
• Procedural: Which are the steps for getting a Master degree?
• Comparative: What is the difference between model A and model B?
• Queries with examples: What hard disks are similar to hard disk X?
• Queries about opinion: What is the opinion of the majority of
Americans about the Iraq war?
76
Generative NLP Applications
Conversational Agents
● social bots
● personal assistants
● customer support
● AI psychiatrists
77
The story of Tay
78
Siri
“I remember the first time we loaded these data sources into Siri.
I typed “start over” into the system, and Siri came back saying,
“Looking for businesses named ‘Over’ in Start, Louisiana.”
— Adam Cheyer
Taken from https://medium.com/swlh/the-story-behind-siri-fbeb109938b0 79
Google’s chat bots
● Google Duplex (2018)
● Meena (2020):
○ $1.5mln
○ 30 days of training
○ TPUv3 (2048 TPU cores)
Taken from https://twitter.com/eturner303/status/1223976313544773634 80
Generative NLP Applications
Story Cloze
Tom and Sheryl have been together for two years. One day, they
went to a carnival. Tom won Sheryl several stuffed bears. When
they reached the Ferris wheel, he got down on one knee.
Which ending is more probable?
● Tom asked Sheryl to marry him.
● He wiped mud off of his boot.
Taken from Mostafazadeh N. et al. (2016) 81
Generative NLP Applications
Computer-Generated Text
Taken from Nick Montfort (2013) 82
Computer-Generated Text
OpenAI language model (2019)
83
Computer-Generated Text
GLTR by MIT-IBM Watson AI lab and HarvardNLP (2019)
84
Multi-Modal NLP Applications
85
Multi-Modal NLP Applications
Speech to Text / Text to Speech
● WaveNet
● https://www.speech.com.ua/index.html
● The era of Open-source solutions
Franz project
86
Multi-Modal NLP Applications
Image Captioning
87
Multi-Modal NLP Applications
NNSE
“Interpretable Semantic
Vectors from a Joint Model of
Brain- and Text-Based
Meaning”
https://www.cs.cmu.edu/~afyshe/papers/acl201
4/jnnse_acl2014.pdf
88
Multi-Modal NLP Applications
Language Learning: Duolingo, Babbel, etc.
89
Multi-Modal NLP Applications
90
Questions?
Interesting References
• HistWords: Word Embeddings for Historical Text
• Peter Eckersley and Yomna Nasser, Measuring the Progress of AI Research
(ongoing)
• Peter Norvig, How to Write a Spelling Corrector (2007)
• Mishra A. et al., Harnessing Cognitive Features for Sarcasm Detection (2016)
• Papantoniou K. and Konstantopoulos S., Unravelling Names of Fictional
Characters (2016)
• Kyunghyun Cho, Introduction to Neural Machine Translation with GPUs
(2015)
• Vaswani A. et al., Attention is all you need (2017)
• Deepmind, WaveNet: A Generative Model for Raw Audio (2016)
• Mostafazadeh N. et al., A Corpus and Cloze Evaluation for Deeper
Understanding of Commonsense Stories (2016)
92
Interesting References
• Nick Montfort, World Clock (2013)
• Microsoft reaches a historic milestone, using AI to match human
performance in translating news from Chinese to English (2018)
• Better Language Models and Their Implications (2019)
93