|
81 | 81 | "Now we know it is more likely for `S` to be replaced by `aSb` than by `e`."
|
82 | 82 | ]
|
83 | 83 | },
|
| 84 | + { |
| 85 | + "cell_type": "markdown", |
| 86 | + "metadata": {}, |
| 87 | + "source": [ |
| 88 | + "### Chomsky Normal Form\n", |
| 89 | + "\n", |
| 90 | + "A grammar is in Chomsky Normal Form (or **CNF**, not to be confused with *Conjunctive Normal Form*) if its rules are one of the three:\n", |
| 91 | + "\n", |
| 92 | + "* `X -> Y Z`\n", |
| 93 | + "* `A -> a`\n", |
| 94 | + "* `S -> ε`\n", |
| 95 | + "\n", |
| 96 | + "Where *X*, *Y*, *Z*, *A* are non-terminals, *a* is a terminal, *ε* is the empty string and *S* is the start symbol (the start symbol should not be appearing on the right hand side of rules). Note that there can be multiple rules for each left hand side non-terminal, as long they follow the above. For example, a rule for *X* might be: `X -> Y Z | A B | a | b`.\n", |
| 97 | + "\n", |
| 98 | + "Of course, we can also have a *CNF* with probabilities.\n", |
| 99 | + "\n", |
| 100 | + "This type of grammar may seem restrictive, but it can be proven that any context-free grammar can be converted to CNF." |
| 101 | + ] |
| 102 | + }, |
84 | 103 | {
|
85 | 104 | "cell_type": "markdown",
|
86 | 105 | "metadata": {},
|
|
275 | 294 | "print(\"Is 'here' a noun?\", grammar.isa('here', 'Noun'))"
|
276 | 295 | ]
|
277 | 296 | },
|
| 297 | + { |
| 298 | + "cell_type": "markdown", |
| 299 | + "metadata": {}, |
| 300 | + "source": [ |
| 301 | + "If the grammar is in Chomsky Normal Form, we can call the class function `cnf_rules` to get all the rules in the form of `(X, Y, Z)` for each `X -> Y Z` rule. Since the above grammar is not in *CNF* though, we have to create a new one." |
| 302 | + ] |
| 303 | + }, |
| 304 | + { |
| 305 | + "cell_type": "code", |
| 306 | + "execution_count": 2, |
| 307 | + "metadata": { |
| 308 | + "collapsed": true |
| 309 | + }, |
| 310 | + "outputs": [], |
| 311 | + "source": [ |
| 312 | + "E_Chomsky = Grammar('E_Prob_Chomsky', # A Grammar in Chomsky Normal Form\n", |
| 313 | + " Rules(\n", |
| 314 | + " S='NP VP',\n", |
| 315 | + " NP='Article Noun | Adjective Noun',\n", |
| 316 | + " VP='Verb NP | Verb Adjective',\n", |
| 317 | + " ),\n", |
| 318 | + " Lexicon(\n", |
| 319 | + " Article='the | a | an',\n", |
| 320 | + " Noun='robot | sheep | fence',\n", |
| 321 | + " Adjective='good | new | sad',\n", |
| 322 | + " Verb='is | say | are'\n", |
| 323 | + " ))" |
| 324 | + ] |
| 325 | + }, |
| 326 | + { |
| 327 | + "cell_type": "code", |
| 328 | + "execution_count": 4, |
| 329 | + "metadata": {}, |
| 330 | + "outputs": [ |
| 331 | + { |
| 332 | + "name": "stdout", |
| 333 | + "output_type": "stream", |
| 334 | + "text": [ |
| 335 | + "[('NP', 'Article', 'Noun'), ('NP', 'Adjective', 'Noun'), ('VP', 'Verb', 'NP'), ('VP', 'Verb', 'Adjective'), ('S', 'NP', 'VP')]\n" |
| 336 | + ] |
| 337 | + } |
| 338 | + ], |
| 339 | + "source": [ |
| 340 | + "print(E_Chomsky.cnf_rules())" |
| 341 | + ] |
| 342 | + }, |
278 | 343 | {
|
279 | 344 | "cell_type": "markdown",
|
280 | 345 | "metadata": {},
|
|
428 | 493 | "print(\"Is 'here' a noun?\", grammar.isa('here', 'Noun'))"
|
429 | 494 | ]
|
430 | 495 | },
|
| 496 | + { |
| 497 | + "cell_type": "markdown", |
| 498 | + "metadata": {}, |
| 499 | + "source": [ |
| 500 | + "If we have a grammar in *CNF*, we can get a list of all the rules. Let's create a grammar in the form and print the *CNF* rules:" |
| 501 | + ] |
| 502 | + }, |
| 503 | + { |
| 504 | + "cell_type": "code", |
| 505 | + "execution_count": 6, |
| 506 | + "metadata": { |
| 507 | + "collapsed": true |
| 508 | + }, |
| 509 | + "outputs": [], |
| 510 | + "source": [ |
| 511 | + "E_Prob_Chomsky = ProbGrammar('E_Prob_Chomsky', # A Probabilistic Grammar in CNF\n", |
| 512 | + " ProbRules(\n", |
| 513 | + " S='NP VP [1]',\n", |
| 514 | + " NP='Article Noun [0.6] | Adjective Noun [0.4]',\n", |
| 515 | + " VP='Verb NP [0.5] | Verb Adjective [0.5]',\n", |
| 516 | + " ),\n", |
| 517 | + " ProbLexicon(\n", |
| 518 | + " Article='the [0.5] | a [0.25] | an [0.25]',\n", |
| 519 | + " Noun='robot [0.4] | sheep [0.4] | fence [0.2]',\n", |
| 520 | + " Adjective='good [0.5] | new [0.2] | sad [0.3]',\n", |
| 521 | + " Verb='is [0.5] | say [0.3] | are [0.2]'\n", |
| 522 | + " ))" |
| 523 | + ] |
| 524 | + }, |
| 525 | + { |
| 526 | + "cell_type": "code", |
| 527 | + "execution_count": 9, |
| 528 | + "metadata": {}, |
| 529 | + "outputs": [ |
| 530 | + { |
| 531 | + "name": "stdout", |
| 532 | + "output_type": "stream", |
| 533 | + "text": [ |
| 534 | + "[('NP', 'Article', 'Noun', 0.6), ('NP', 'Adjective', 'Noun', 0.4), ('VP', 'Verb', 'NP', 0.5), ('VP', 'Verb', 'Adjective', 0.5), ('S', 'NP', 'VP', 1.0)]\n" |
| 535 | + ] |
| 536 | + } |
| 537 | + ], |
| 538 | + "source": [ |
| 539 | + "print(E_Prob_Chomsky.cnf_rules())" |
| 540 | + ] |
| 541 | + }, |
431 | 542 | {
|
432 | 543 | "cell_type": "markdown",
|
433 | 544 | "metadata": {},
|
|
0 commit comments