Notebook + Implementation: Perceptron (#512)

antmarakis · norvig · commit cd08becf67c3 · 2017-05-23T22:12:10.000-07:00
* Update learning.ipynb

* Delete perceptron.png

* Add new Perceptron image

* Update Perceptron Implementation
diff --git a/images/perceptron.png b/images/perceptron.png
diff --git a/learning.ipynb b/learning.ipynb
@@ -14,7 +14,7 @@
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 1,
    "metadata": {
     "collapsed": true,
     "deletable": true,
@@ -582,7 +582,10 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "## Distance Functions\n",
     "\n",
@@ -597,7 +600,9 @@
    "cell_type": "code",
    "execution_count": 2,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [
     {
@@ -619,7 +624,10 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### Euclidean Distance (`euclidean_distance`)\n",
     "\n",
@@ -630,7 +638,9 @@
    "cell_type": "code",
    "execution_count": 13,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [
     {
@@ -652,7 +662,10 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### Hamming Distance (`hamming_distance`)\n",
     "\n",
@@ -663,7 +676,9 @@
    "cell_type": "code",
    "execution_count": 4,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [
     {
@@ -685,7 +700,10 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### Mean Boolean Error (`mean_boolean_error`)\n",
     "\n",
@@ -696,7 +714,9 @@
    "cell_type": "code",
    "execution_count": 9,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [
     {
@@ -718,7 +738,10 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### Mean Error (`mean_error`)\n",
     "\n",
@@ -729,7 +752,9 @@
    "cell_type": "code",
    "execution_count": 10,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [
     {
@@ -751,7 +776,10 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### Mean Square Error (`ms_error`)\n",
     "\n",
@@ -762,7 +790,9 @@
    "cell_type": "code",
    "execution_count": 11,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [
     {
@@ -784,7 +814,10 @@
   },
   {
    "cell_type": "markdown",
-   "metadata": {},
+   "metadata": {
+    "deletable": true,
+    "editable": true
+   },
    "source": [
     "### Root of Mean Square Error (`rms_error`)\n",
     "\n",
@@ -795,7 +828,9 @@
    "cell_type": "code",
    "execution_count": 12,
    "metadata": {
-    "collapsed": false
+    "collapsed": false,
+    "deletable": true,
+    "editable": true
    },
    "outputs": [
     {
@@ -1062,8 +1097,17 @@
     "\n",
     "The Perceptron is a linear classifier. It works the same way as a neural network with no hidden layers (just input and output). First it trains its weights given a dataset and then it can classify a new item by running it through the network.\n",
     "\n",
-    "You can think of it as a single neuron. It has *n* synapses, each with its own weight. Each synapse corresponds to one item feature. Perceptron multiplies each item feature with the corresponding synapse weight and then adds them together (aka, the dot product) and checks whether this value is greater than the threshold. If yes, it returns 1. It returns 0 otherwise.\n",
+    "Its input layer consists of the the item features, while the output layer consists of nodes (also called neurons). Each node in the output layer has *n* synapses (for every item feature), each with its own weight. Then, the nodes find the dot product of the item features and the synapse weights. These values then pass through an activation function (usually a sigmoid). Finally, we pick the largest of the values and we return its index.\n",
+    "\n",
+    "Note that in classification problems each node represents a class. The final classification is the class/node with the max output value.\n",
     "\n",
+    "Below you can see a single node/neuron in the outer layer. With *f* we denote the item features, with *w* the synapse weights, then inside the node we have the dot product and the activation function, *g*."
+   ]
+  },
+  {
+   "cell_type": "markdown",
+   "metadata": {},
+   "source": [
     "![perceptron](images/perceptron.png)"
    ]
   },
@@ -1076,50 +1120,20 @@
    "source": [
     "### Implementation\n",
     "\n",
-    "First, we train (calculate) the weights given a dataset, using the `BackPropagationLearner` function of `learning.py`. We then return a function, `predict`, which we will use in the future to classify a new item. The function computes the (algebraic) dot product of the item with the calculated weights. If the result is greater than a predefined threshold (usually 0.5, 0 or 1), it returns 1. If it is less than the threshold, it returns 0.\n",
-    "\n",
-    "NOTE: The current implementation of the algorithm classifies an item into one of two classes. It is a binary classifier and will not work well for multi-class datasets."
+    "First, we train (calculate) the weights given a dataset, using the `BackPropagationLearner` function of `learning.py`. We then return a function, `predict`, which we will use in the future to classify a new item. The function computes the (algebraic) dot product of the item with the calculated weights for each node in the outer layer. Then it picks the greatest value and classifies the item in the corresponding class."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 6,
+   "execution_count": 2,
    "metadata": {
     "collapsed": true,
     "deletable": true,
     "editable": true
    },
    "outputs": [],
    "source": [
-    "def PerceptronLearner(dataset, learning_rate=0.01, epochs=100):\n",
-    "    \"\"\"Logistic Regression, NO hidden layer\"\"\"\n",
-    "    i_units = len(dataset.inputs)\n",
-    "    o_units = 1  # As of now, dataset.target gives only one index.\n",
-    "    hidden_layer_sizes = []\n",
-    "    raw_net = network(i_units, hidden_layer_sizes, o_units)\n",
-    "    learned_net = BackPropagationLearner(dataset, raw_net, learning_rate, epochs)\n",
-    "\n",
-    "    def predict(example):\n",
-    "        # Input nodes\n",
-    "        i_nodes = learned_net[0]\n",
-    "\n",
-    "        # Activate input layer\n",
-    "        for v, n in zip(example, i_nodes):\n",
-    "            n.value = v\n",
-    "\n",
-    "        # Forward pass\n",
-    "        for layer in learned_net[1:]:\n",
-    "            for node in layer:\n",
-    "                inc = [n.value for n in node.inputs]\n",
-    "                in_val = dotproduct(inc, node.weights)\n",
-    "                node.value = node.activation(in_val)\n",
-    "\n",
-    "        # Hypothesis\n",
-    "        o_nodes = learned_net[-1]\n",
-    "        pred = [o_nodes[i].value for i in range(o_units)]\n",
-    "        return 1 if pred[0] >= 0.5 else 0\n",
-    "\n",
-    "    return predict"
+    "%psource PerceptronLearner"
    ]
   },
   {
@@ -1129,11 +1143,9 @@
     "editable": true
    },
    "source": [
-    "The weights are trained from the `BackPropagationLearner`. Note that the perceptron is a one-layer neural network, without any hidden layers. So, in `BackPropagationLearner`, we will pass no hidden layers. From that function we get our network, which is just one node, with the weights calculated.\n",
-    "\n",
-    "`PerceptronLearner` returns `predict`, a function that can be used to classify a new item.\n",
+    "Note that the Perceptron is a one-layer neural network, without any hidden layers. So, in `BackPropagationLearner`, we will pass no hidden layers. From that function we get our network, which is just one layer, with the weights calculated.\n",
     "\n",
-    "That function passes the input/example through the network, calculating the dot product of the input and the weights. If that value is greater than or equal to 0.5, it returns 1. Otherwise it returns 0."
+    "That function `predict` passes the input/example through the network, calculating the dot product of the input and the weights for each node and returns the class with the max dot product."
    ]
   },
   {
@@ -1145,14 +1157,12 @@
    "source": [
     "### Example\n",
     "\n",
-    "We will train the Perceptron on the iris dataset. Because, though, the algorithm is a binary classifier (which means it classifies an item in one of two classes) and the iris dataset has three classes, we need to transform the dataset into a proper form, with only two classes. Therefore, we will remove the third and final class of the dataset, *Virginica*.\n",
-    "\n",
-    "Then, we will try and classify the item/flower with measurements of 5,3,1,0.1."
+    "We will train the Perceptron on the iris dataset. Because though the `BackPropagationLearner` works with integer indexes and not strings, we need to convert class names to integers. Then, we will try and classify the item/flower with measurements of 5, 3, 1, 0.1."
    ]
   },
   {
    "cell_type": "code",
-   "execution_count": 7,
+   "execution_count": 10,
    "metadata": {
     "collapsed": false,
     "deletable": true,
@@ -1169,11 +1179,10 @@
    ],
    "source": [
     "iris = DataSet(name=\"iris\")\n",
-    "iris.remove_examples(\"virginica\")\n",
     "iris.classes_to_numbers()\n",
     "\n",
     "perceptron = PerceptronLearner(iris)\n",
-    "print(perceptron([5,3,1,0.1]))"
+    "print(perceptron([5, 3, 1, 0.1]))"
    ]
   },
   {
@@ -1183,7 +1192,7 @@
     "editable": true
    },
    "source": [
-    "The output is 0, which means the item is classified in the first class, *setosa*. This is indeed correct. Note that the Perceptron algorithm is not perfect and may produce false classifications."
+    "The output is 0, which means the item is classified in the first class, \"Setosa\". This is indeed correct. Note that the Perceptron algorithm is not perfect and may produce false classifications."
    ]
   },
   {
diff --git a/learning.py b/learning.py
@@ -653,24 +653,15 @@ def PerceptronLearner(dataset, learning_rate=0.01, epochs=100):
     learned_net = BackPropagationLearner(dataset, raw_net, learning_rate, epochs)
 
     def predict(example):
-        # Input nodes
-        i_nodes = learned_net[0]
-
-        # Activate input layer
-        for v, n in zip(example, i_nodes):
-            n.value = v
+        o_nodes = learned_net[1]
 
         # Forward pass
-        for layer in learned_net[1:]:
-            for node in layer:
-                inc = [n.value for n in node.inputs]
-                in_val = dotproduct(inc, node.weights)
-                node.value = node.activation(in_val)
+        for node in o_nodes:
+            in_val = dotproduct(example, node.weights)
+            node.value = node.activation(in_val)
 
         # Hypothesis
-        o_nodes = learned_net[-1]
-        prediction = find_max_node(o_nodes)
-        return prediction
+        return find_max_node(o_nodes)
 
     return predict