|
22 | 22 | "from learning import *"
|
23 | 23 | ]
|
24 | 24 | },
|
| 25 | + { |
| 26 | + "cell_type": "markdown", |
| 27 | + "metadata": {}, |
| 28 | + "source": [ |
| 29 | + "## Contents\n", |
| 30 | + "\n", |
| 31 | + "* Review\n", |
| 32 | + "* Explanations of learning module\n", |
| 33 | + "* Practical Machine Learning Task\n", |
| 34 | + " * MNIST handwritten digits classification\n", |
| 35 | + " * Loading and Visualising digits data\n", |
| 36 | + " * kNN classifier\n", |
| 37 | + " * Review\n", |
| 38 | + " * Native implementation from Learning module\n", |
| 39 | + " * Faster implementation using NumPy\n", |
| 40 | + " * Overfitting and how to avoid it\n", |
| 41 | + " * Train-Test split\n", |
| 42 | + " * Crossvalidation\n", |
| 43 | + " * Regularisation\n", |
| 44 | + " * Sub-sampling\n", |
| 45 | + " * Fine tuning parameters to get better results\n", |
| 46 | + " * Email spam detector" |
| 47 | + ] |
| 48 | + }, |
25 | 49 | {
|
26 | 50 | "cell_type": "markdown",
|
27 | 51 | "metadata": {},
|
|
50 | 74 | "\n",
|
51 | 75 | "In Reinforcement Learning the agent learns from a series of reinforcements—rewards or punishments.\n",
|
52 | 76 | "\n",
|
53 |
| - "**Example**: Let's talk about an agent to play the popular Atari game—[Pong](http://www.ponggame.org). We will reward a point for every correct move and deduct a point for every wrong move from the agent. Eventually, the agent will figure out its actions prior to reinforcement were most responsible for it.\n", |
54 |
| - "\n", |
55 |
| - "## Contents\n", |
56 |
| - "\n", |
57 |
| - "* Explanations of learning module\n", |
58 |
| - "* Practical Machine Learning Task\n", |
59 |
| - " * MNIST handwritten digits classification\n", |
60 |
| - " * Loading and Visualising digits data\n", |
61 |
| - " * Naive kNN classifier\n", |
62 |
| - " * Overfitting and how to avoid it\n", |
63 |
| - " * Train-Test split\n", |
64 |
| - " * Crossvalidation\n", |
65 |
| - " * Regularisation\n", |
66 |
| - " * Email spam detector" |
| 77 | + "**Example**: Let's talk about an agent to play the popular Atari game—[Pong](http://www.ponggame.org). We will reward a point for every correct move and deduct a point for every wrong move from the agent. Eventually, the agent will figure out its actions prior to reinforcement were most responsible for it." |
67 | 78 | ]
|
68 | 79 | },
|
69 | 80 | {
|
|
92 | 103 | "* Single-hidden-layer Neural Network classifier\n",
|
93 | 104 | "* SVMs (Support Vector Machines)\n",
|
94 | 105 | "\n",
|
95 |
| - "It is estimates that humans have an error rate of about **0.2%** on this problem. Let's see how our algorithms perform!\n", |
| 106 | + "It is estimates that humans have an error rate of about **0.2%** on this problem. Let's see how our algorithms perform!" |
| 107 | + ] |
| 108 | + }, |
| 109 | + { |
| 110 | + "cell_type": "markdown", |
| 111 | + "metadata": {}, |
| 112 | + "source": [ |
| 113 | + "## Loading MNIST digits data\n", |
96 | 114 | "\n",
|
97 | 115 | "Let's start by loading MNIST data into numpy arrays."
|
98 | 116 | ]
|
|
220 | 238 | "cell_type": "markdown",
|
221 | 239 | "metadata": {},
|
222 | 240 | "source": [
|
| 241 | + "## Visualizing MNIST digits data\n", |
| 242 | + "\n", |
223 | 243 | "To get a better understanding of the dataset, let's visualize some random images for each class from training & testing datasets."
|
224 | 244 | ]
|
225 | 245 | },
|
|
442 | 462 | "\n",
|
443 | 463 | "Similarly if we put **k = 5**, you can observe that there are 4 yellow points, which is majority. So, we classify our test point as **yellow- Class A**.\n",
|
444 | 464 | "\n",
|
445 |
| - "In practical tasks, we iterate through a bunch of values for k (like [1, 2, 5, 10, 20, 50, 100]) and see how it performs and select the best one.\n", |
| 465 | + "In practical tasks, we iterate through a bunch of values for k (like [1, 2, 5, 10, 20, 50, 100]) and see how it performs and select the best one. " |
| 466 | + ] |
| 467 | + }, |
| 468 | + { |
| 469 | + "cell_type": "markdown", |
| 470 | + "metadata": {}, |
| 471 | + "source": [ |
| 472 | + "### Native implementations from Learning module\n", |
446 | 473 | "\n",
|
447 | 474 | "Let's classify MNIST data in this method. Similar to these points, our images in MNIST data also have **features**. These points have two features as (2, 3) which represents co-ordinates of the point in 2-dimentional plane. Our images have 28x28 pixel values and we treat them as **features** for this particular task. \n",
|
448 | 475 | "\n",
|
449 |
| - "Next couple of cells help you understand some useful definitions from learning module. " |
| 476 | + "Next couple of cells help you understand some useful definitions from learning module." |
450 | 477 | ]
|
451 | 478 | },
|
452 | 479 | {
|
|
629 | 656 | "source": [
|
630 | 657 | "Hurray! We've got it correct. Don't worry if our algorithm predicted a wrong class. With this techinique we have only ~97% accuracy on this dataset. Let's try with a different test image and hope we get it this time.\n",
|
631 | 658 | "\n",
|
632 |
| - "You might have recognized that our algorithm took ~20 seconds to predict a single image. How would we even predict all 10,000 test images? Yeah, the implementations we have in our learning module are not optimized to run on this particular dataset. We will have an optimised version below in NumPy which is nearly ~50-100 times faster than this implementation." |
| 659 | + "You might have recognized that our algorithm took ~20 seconds to predict a single image. How would we even predict all 10,000 test images? Yeah, the implementations we have in our learning module are not optimized to run on this particular dataset. We will have an optimised version below in NumPy which is nearly ~50-100 times faster than our native implementation." |
633 | 660 | ]
|
634 | 661 | },
|
635 | 662 | {
|
636 | 663 | "cell_type": "markdown",
|
637 | 664 | "metadata": {},
|
638 | 665 | "source": [
|
639 |
| - "### Faster kNN classifier implementation" |
| 666 | + "### Faster implementation using NumPy\n", |
| 667 | + "\n", |
| 668 | + "Here we calculate manhattan distance between two images faster than our native implementation. Which in turn make predicting labels for test images far efficient." |
640 | 669 | ]
|
641 | 670 | },
|
642 | 671 | {
|
|
682 | 711 | " "
|
683 | 712 | ]
|
684 | 713 | },
|
| 714 | + { |
| 715 | + "cell_type": "markdown", |
| 716 | + "metadata": {}, |
| 717 | + "source": [ |
| 718 | + "Let's print the shapes of data to make sure everything's on track." |
| 719 | + ] |
| 720 | + }, |
| 721 | + { |
| 722 | + "cell_type": "code", |
| 723 | + "execution_count": 2, |
| 724 | + "metadata": { |
| 725 | + "collapsed": false |
| 726 | + }, |
| 727 | + "outputs": [ |
| 728 | + { |
| 729 | + "ename": "NameError", |
| 730 | + "evalue": "name 'train_img' is not defined", |
| 731 | + "output_type": "error", |
| 732 | + "traceback": [ |
| 733 | + "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m", |
| 734 | + "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)", |
| 735 | + "\u001b[0;32m<ipython-input-2-bcdb30eb7f90>\u001b[0m in \u001b[0;36m<module>\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Training images size:\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtrain_img\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 2\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Training labels size:\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtrain_lbl\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 3\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Testing images size:\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtest_img\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 4\u001b[0m \u001b[0mprint\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Training labels size:\"\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0mtest_lbl\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mshape\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n", |
| 736 | + "\u001b[0;31mNameError\u001b[0m: name 'train_img' is not defined" |
| 737 | + ] |
| 738 | + } |
| 739 | + ], |
| 740 | + "source": [ |
| 741 | + "print(\"Training images size:\", train_img.shape)\n", |
| 742 | + "print(\"Training labels size:\", train_lbl.shape)\n", |
| 743 | + "print(\"Testing images size:\", test_img.shape)\n", |
| 744 | + "print(\"Training labels size:\", test_lbl.shape)" |
| 745 | + ] |
| 746 | + }, |
685 | 747 | {
|
686 | 748 | "cell_type": "code",
|
687 | 749 | "execution_count": 21,
|
|
0 commit comments