Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 4887b0e

Browse files
antmarakisnorvig
authored andcommitted
NLP Notebook + Tests: Chomsky Normal Form (aimacode#607)
* add cnf_rules to grammar * Update nlp.ipynb * Update test_nlp.py * add more to CNF section
1 parent 790213a commit 4887b0e

File tree

3 files changed

+144
-0
lines changed

3 files changed

+144
-0
lines changed

nlp.ipynb

Lines changed: 111 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -81,6 +81,25 @@
8181
"Now we know it is more likely for `S` to be replaced by `aSb` than by `e`."
8282
]
8383
},
84+
{
85+
"cell_type": "markdown",
86+
"metadata": {},
87+
"source": [
88+
"### Chomsky Normal Form\n",
89+
"\n",
90+
"A grammar is in Chomsky Normal Form (or **CNF**, not to be confused with *Conjunctive Normal Form*) if its rules are one of the three:\n",
91+
"\n",
92+
"* `X -> Y Z`\n",
93+
"* `A -> a`\n",
94+
"* `S -> ε`\n",
95+
"\n",
96+
"Where *X*, *Y*, *Z*, *A* are non-terminals, *a* is a terminal, *ε* is the empty string and *S* is the start symbol (the start symbol should not be appearing on the right hand side of rules). Note that there can be multiple rules for each left hand side non-terminal, as long they follow the above. For example, a rule for *X* might be: `X -> Y Z | A B | a | b`.\n",
97+
"\n",
98+
"Of course, we can also have a *CNF* with probabilities.\n",
99+
"\n",
100+
"This type of grammar may seem restrictive, but it can be proven that any context-free grammar can be converted to CNF."
101+
]
102+
},
84103
{
85104
"cell_type": "markdown",
86105
"metadata": {},
@@ -275,6 +294,52 @@
275294
"print(\"Is 'here' a noun?\", grammar.isa('here', 'Noun'))"
276295
]
277296
},
297+
{
298+
"cell_type": "markdown",
299+
"metadata": {},
300+
"source": [
301+
"If the grammar is in Chomsky Normal Form, we can call the class function `cnf_rules` to get all the rules in the form of `(X, Y, Z)` for each `X -> Y Z` rule. Since the above grammar is not in *CNF* though, we have to create a new one."
302+
]
303+
},
304+
{
305+
"cell_type": "code",
306+
"execution_count": 2,
307+
"metadata": {
308+
"collapsed": true
309+
},
310+
"outputs": [],
311+
"source": [
312+
"E_Chomsky = Grammar('E_Prob_Chomsky', # A Grammar in Chomsky Normal Form\n",
313+
" Rules(\n",
314+
" S='NP VP',\n",
315+
" NP='Article Noun | Adjective Noun',\n",
316+
" VP='Verb NP | Verb Adjective',\n",
317+
" ),\n",
318+
" Lexicon(\n",
319+
" Article='the | a | an',\n",
320+
" Noun='robot | sheep | fence',\n",
321+
" Adjective='good | new | sad',\n",
322+
" Verb='is | say | are'\n",
323+
" ))"
324+
]
325+
},
326+
{
327+
"cell_type": "code",
328+
"execution_count": 4,
329+
"metadata": {},
330+
"outputs": [
331+
{
332+
"name": "stdout",
333+
"output_type": "stream",
334+
"text": [
335+
"[('NP', 'Article', 'Noun'), ('NP', 'Adjective', 'Noun'), ('VP', 'Verb', 'NP'), ('VP', 'Verb', 'Adjective'), ('S', 'NP', 'VP')]\n"
336+
]
337+
}
338+
],
339+
"source": [
340+
"print(E_Chomsky.cnf_rules())"
341+
]
342+
},
278343
{
279344
"cell_type": "markdown",
280345
"metadata": {},
@@ -428,6 +493,52 @@
428493
"print(\"Is 'here' a noun?\", grammar.isa('here', 'Noun'))"
429494
]
430495
},
496+
{
497+
"cell_type": "markdown",
498+
"metadata": {},
499+
"source": [
500+
"If we have a grammar in *CNF*, we can get a list of all the rules. Let's create a grammar in the form and print the *CNF* rules:"
501+
]
502+
},
503+
{
504+
"cell_type": "code",
505+
"execution_count": 6,
506+
"metadata": {
507+
"collapsed": true
508+
},
509+
"outputs": [],
510+
"source": [
511+
"E_Prob_Chomsky = ProbGrammar('E_Prob_Chomsky', # A Probabilistic Grammar in CNF\n",
512+
" ProbRules(\n",
513+
" S='NP VP [1]',\n",
514+
" NP='Article Noun [0.6] | Adjective Noun [0.4]',\n",
515+
" VP='Verb NP [0.5] | Verb Adjective [0.5]',\n",
516+
" ),\n",
517+
" ProbLexicon(\n",
518+
" Article='the [0.5] | a [0.25] | an [0.25]',\n",
519+
" Noun='robot [0.4] | sheep [0.4] | fence [0.2]',\n",
520+
" Adjective='good [0.5] | new [0.2] | sad [0.3]',\n",
521+
" Verb='is [0.5] | say [0.3] | are [0.2]'\n",
522+
" ))"
523+
]
524+
},
525+
{
526+
"cell_type": "code",
527+
"execution_count": 9,
528+
"metadata": {},
529+
"outputs": [
530+
{
531+
"name": "stdout",
532+
"output_type": "stream",
533+
"text": [
534+
"[('NP', 'Article', 'Noun', 0.6), ('NP', 'Adjective', 'Noun', 0.4), ('VP', 'Verb', 'NP', 0.5), ('VP', 'Verb', 'Adjective', 0.5), ('S', 'NP', 'VP', 1.0)]\n"
535+
]
536+
}
537+
],
538+
"source": [
539+
"print(E_Prob_Chomsky.cnf_rules())"
540+
]
541+
},
431542
{
432543
"cell_type": "markdown",
433544
"metadata": {},

nlp.py

Lines changed: 25 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -52,6 +52,16 @@ def isa(self, word, cat):
5252
"""Return True iff word is of category cat"""
5353
return cat in self.categories[word]
5454

55+
def cnf_rules(self):
56+
"""Returns the tuple (X, Y, Z) for rules in the form:
57+
X -> Y Z"""
58+
cnf = []
59+
for X, rules in self.rules.items():
60+
for (Y, Z) in rules:
61+
cnf.append((X, Y, Z))
62+
63+
return cnf
64+
5565
def generate_random(self, S='S'):
5666
"""Replace each token in S by a random entry in grammar (recursively)."""
5767
import random
@@ -229,6 +239,21 @@ def __repr__(self):
229239
Digit="0 [0.35] | 1 [0.35] | 2 [0.3]"
230240
))
231241

242+
243+
244+
E_Chomsky = Grammar('E_Prob_Chomsky', # A Grammar in Chomsky Normal Form
245+
Rules(
246+
S='NP VP',
247+
NP='Article Noun | Adjective Noun',
248+
VP='Verb NP | Verb Adjective',
249+
),
250+
Lexicon(
251+
Article='the | a | an',
252+
Noun='robot | sheep | fence',
253+
Adjective='good | new | sad',
254+
Verb='is | say | are'
255+
))
256+
232257
E_Prob_Chomsky = ProbGrammar('E_Prob_Chomsky', # A Probabilistic Grammar in CNF
233258
ProbRules(
234259
S='NP VP [1]',

tests/test_nlp.py

Lines changed: 8 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -32,6 +32,10 @@ def test_grammar():
3232
assert grammar.rewrites_for('A') == [['B', 'C'], ['D', 'E']]
3333
assert grammar.isa('the', 'Article')
3434

35+
grammar = nlp.E_Chomsky
36+
for rule in grammar.cnf_rules():
37+
assert len(rule) == 3
38+
3539

3640
def test_generation():
3741
lexicon = Lexicon(Article="the | a | an",
@@ -77,6 +81,10 @@ def test_prob_grammar():
7781
assert grammar.rewrites_for('A') == [(['B', 'C'], 0.3), (['D', 'E'], 0.7)]
7882
assert grammar.isa('the', 'Article')
7983

84+
grammar = nlp.E_Prob_Chomsky
85+
for rule in grammar.cnf_rules():
86+
assert len(rule) == 4
87+
8088

8189
def test_prob_generation():
8290
lexicon = ProbLexicon(Verb="am [0.5] | are [0.25] | is [0.25]",

0 commit comments

Comments
 (0)