Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit 935822e

Browse files
committed
adds method to load MNIST data in learning notebook
1 parent cb0895a commit 935822e

File tree

1 file changed

+139
-0
lines changed

1 file changed

+139
-0
lines changed

learning.ipynb

Lines changed: 139 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -53,6 +53,145 @@
5353
"**Example**: Let's talk about an agent to play the popular Atari game—[Pong](http://www.ponggame.org). We will reward a point for every correct move and deduct a point for every wrong move from the agent. Eventually, the agent will figure out its actions prior to reinforcement were most responsible for it."
5454
]
5555
},
56+
{
57+
"cell_type": "markdown",
58+
"metadata": {
59+
"collapsed": true
60+
},
61+
"source": []
62+
},
63+
{
64+
"cell_type": "markdown",
65+
"metadata": {},
66+
"source": [
67+
"## Practical Machine Learning Task\n",
68+
"\n",
69+
"### MNIST hand-written digits calssification\n",
70+
"\n",
71+
"The MNIST database, available from [this page](http://yann.lecun.com/exdb/mnist/) is a large database of handwritten digits that is commonly used for training & testing/validating in Machine learning.\n",
72+
"\n",
73+
"The dataset has **60,000 training images** each of size 28x28 pixels with labels and **10,000 testing images** of size 28x28 pixels with labels.\n",
74+
"\n",
75+
"In this section, we will use this database to compare performances of these different learning algorithms:\n",
76+
"* kNN (k-Nearest Neighbour) classifier\n",
77+
"* Single-hidden-layer Neural Network classifier\n",
78+
"* SVMs (Support Vector Machines)\n",
79+
"\n",
80+
"It is estimates that humans have an error rate of about **0.2%** on this problem. Let's see how our algorithms perform!\n",
81+
"\n",
82+
"Let's start by loading MNIST data into numpy arrays."
83+
]
84+
},
85+
{
86+
"cell_type": "code",
87+
"execution_count": 2,
88+
"metadata": {
89+
"collapsed": true
90+
},
91+
"outputs": [],
92+
"source": [
93+
"import os, struct\n",
94+
"import array\n",
95+
"import numpy as np\n",
96+
"\n",
97+
"def load_MNIST(path=\"aima-data/MNIST\"):\n",
98+
" \"helper function to load MNIST data\"\n",
99+
" train_img_file = open(os.path.join(path, \"train-images-idx3-ubyte\"), \"rb\")\n",
100+
" train_lbl_file = open(os.path.join(path, \"train-labels-idx1-ubyte\"), \"rb\")\n",
101+
" test_img_file = open(os.path.join(path, \"t10k-images-idx3-ubyte\"), \"rb\")\n",
102+
" test_lbl_file = open(os.path.join(path, 't10k-labels-idx1-ubyte'), \"rb\")\n",
103+
" \n",
104+
" magic_nr, tr_size, tr_rows, tr_cols = struct.unpack(\">IIII\", train_img_file.read(16))\n",
105+
" tr_img = array.array(\"B\", train_img_file.read())\n",
106+
" train_img_file.close() \n",
107+
" magic_nr, tr_size = struct.unpack(\">II\", train_lbl_file.read(8))\n",
108+
" tr_lbl = array.array(\"b\", train_lbl_file.read())\n",
109+
" train_lbl_file.close()\n",
110+
" \n",
111+
" magic_nr, te_size, te_rows, te_cols = struct.unpack(\">IIII\", test_img_file.read(16))\n",
112+
" te_img = array.array(\"B\", test_img_file.read())\n",
113+
" test_img_file.close()\n",
114+
" magic_nr, te_size = struct.unpack(\">II\", test_lbl_file.read(8))\n",
115+
" te_lbl = array.array(\"b\", test_lbl_file.read())\n",
116+
" test_lbl_file.close()\n",
117+
"\n",
118+
"# print(len(tr_img), len(tr_lbl), tr_size)\n",
119+
"# print(len(te_img), len(te_lbl), te_size)\n",
120+
" \n",
121+
" train_img = np.zeros((tr_size, tr_rows*tr_cols), dtype=np.uint8)\n",
122+
" train_lbl = np.zeros((tr_size,), dtype=np.int8)\n",
123+
" for i in range(tr_size):\n",
124+
" train_img[i] = np.array(tr_img[i*tr_rows*tr_cols : (i+1)*tr_rows*tr_cols]).reshape((tr_rows*te_cols))\n",
125+
" train_lbl[i] = tr_lbl[i]\n",
126+
" \n",
127+
" test_img = np.zeros((te_size, te_rows*te_cols), dtype=np.uint8)\n",
128+
" test_lbl = np.zeros((te_size,), dtype=np.int8)\n",
129+
" for i in range(te_size):\n",
130+
" test_img[i] = np.array(te_img[i*te_rows*te_cols : (i+1)*te_rows*te_cols]).reshape((te_rows*te_cols))\n",
131+
" test_lbl[i] = te_lbl[i]\n",
132+
" \n",
133+
" return(train_img, train_lbl, test_img, test_lbl)"
134+
]
135+
},
136+
{
137+
"cell_type": "markdown",
138+
"metadata": {},
139+
"source": [
140+
"The function `load_MNIST()` loads MNIST data from files saved in `aima-data/MNIST`. It returns four numpy arrays that we are gonna use to train & classify hand-written digits in various learning approaches."
141+
]
142+
},
143+
{
144+
"cell_type": "code",
145+
"execution_count": 3,
146+
"metadata": {
147+
"collapsed": false
148+
},
149+
"outputs": [],
150+
"source": [
151+
"train_img, train_lbl, test_img, test_lbl = load_MNIST()"
152+
]
153+
},
154+
{
155+
"cell_type": "markdown",
156+
"metadata": {},
157+
"source": [
158+
"Check the shape of these NumPy arrays to make sure we have loaded the database correctly.\n",
159+
"\n",
160+
"Each 28x28 pixel image is flattened to 784x1 array and we should have 60,000 of them in training data. Similarly we should have 10,000 of those 784x1 arrays in testing data. "
161+
]
162+
},
163+
{
164+
"cell_type": "code",
165+
"execution_count": 4,
166+
"metadata": {
167+
"collapsed": false
168+
},
169+
"outputs": [
170+
{
171+
"name": "stdout",
172+
"output_type": "stream",
173+
"text": [
174+
"Training images size: (60000, 784)\n",
175+
"Training labels size: (60000,)\n",
176+
"Testing images size: (10000, 784)\n",
177+
"Training labels size: (10000,)\n"
178+
]
179+
}
180+
],
181+
"source": [
182+
"print(\"Training images size:\", train_img.shape)\n",
183+
"print(\"Training labels size:\", train_lbl.shape)\n",
184+
"print(\"Testing images size:\", test_img.shape)\n",
185+
"print(\"Training labels size:\", test_lbl.shape)"
186+
]
187+
},
188+
{
189+
"cell_type": "markdown",
190+
"metadata": {},
191+
"source": [
192+
"Let's visualize some of the images from training & testing datasets."
193+
]
194+
},
56195
{
57196
"cell_type": "code",
58197
"execution_count": null,

0 commit comments

Comments
 (0)