Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit b1f4820

Browse files
authored
Merge pull request #7645 from QuLogic/sample-data-cleanup
Clean up stock sample data.
2 parents 4d390fa + 9ba1fd5 commit b1f4820

File tree

15 files changed

+80
-6167
lines changed

15 files changed

+80
-6167
lines changed
Lines changed: 13 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,13 @@
1+
Cleanup of stock sample data
2+
````````````````````````````
3+
4+
The sample data of stocks has been cleaned up to remove redundancies and
5+
increase portability. The ``AAPL.dat.gz``, ``INTC.dat.gz`` and ``aapl.csv``
6+
files have been removed entirely and will also no longer be available from
7+
`matplotlib.cbook.get_sample_data`. If a CSV file is required, we suggest using
8+
the ``msft.csv`` that continues to be shipped in the sample data. If a NumPy
9+
binary file is acceptable, we suggest using one of the following two new files.
10+
The ``aapl.npy.gz`` and ``goog.npy`` files have been replaced by ``aapl.npz``
11+
and ``goog.npz``, wherein the first column's type has changed from
12+
`datetime.date` to `np.datetime64` for better portability across Python
13+
versions. Note that matplotlib does not fully support `np.datetime64` as yet.

doc/users/recipes.rst

Lines changed: 32 additions & 33 deletions
Original file line numberDiff line numberDiff line change
@@ -87,28 +87,38 @@ gracefully, and here are some tricks to help you work around them.
8787
We'll load up some sample date data which contains datetime.date
8888
objects in a numpy record array::
8989

90-
In [63]: datafile = cbook.get_sample_data('goog.npy')
90+
In [63]: datafile = cbook.get_sample_data('goog.npz')
9191

92-
In [64]: r = np.load(datafile).view(np.recarray)
92+
In [64]: r = np.load(datafile)['price_data'].view(np.recarray)
9393

9494
In [65]: r.dtype
95-
Out[65]: dtype([('date', '|O4'), ('', '|V4'), ('open', '<f8'),
95+
Out[65]: dtype([('date', '<M8[D]'), ('', '|V4'), ('open', '<f8'),
9696
('high', '<f8'), ('low', '<f8'), ('close', '<f8'),
9797
('volume', '<i8'), ('adj_close', '<f8')])
9898

9999
In [66]: r.date
100100
Out[66]:
101-
array([2004-08-19, 2004-08-20, 2004-08-23, ..., 2008-10-10, 2008-10-13,
102-
2008-10-14], dtype=object)
101+
array(['2004-08-19', '2004-08-20', '2004-08-23', ..., '2008-10-10',
102+
'2008-10-13', '2008-10-14'], dtype='datetime64[D]')
103103

104-
The dtype of the numpy record array for the field ``date`` is ``|O4``
105-
which means it is a 4-byte python object pointer; in this case the
106-
objects are datetime.date instances, which we can see when we print
107-
some samples in the ipython terminal window.
104+
The dtype of the NumPy record array for the field ``date`` is ``datetime64[D]``
105+
which means it is a 64-bit `np.datetime64` in 'day' units. While this format is
106+
more portable, Matplotlib cannot plot this format natively yet. We can plot
107+
this data by changing the dates to `datetime.date` instances instead, which can
108+
be achieved by converting to an object array::
109+
110+
In [67]: r.date.astype('O')
111+
array([datetime.date(2004, 8, 19), datetime.date(2004, 8, 20),
112+
datetime.date(2004, 8, 23), ..., datetime.date(2008, 10, 10),
113+
datetime.date(2008, 10, 13), datetime.date(2008, 10, 14)],
114+
dtype=object)
115+
116+
The dtype of this converted array is now ``object`` and it is filled with
117+
datetime.date instances instead.
108118

109119
If you plot the data, ::
110120

111-
In [67]: plot(r.date, r.close)
121+
In [67]: plot(r.date.astype('O'), r.close)
112122
Out[67]: [<matplotlib.lines.Line2D object at 0x92a6b6c>]
113123

114124
you will see that the x tick labels are all squashed together.
@@ -117,18 +127,12 @@ you will see that the x tick labels are all squashed together.
117127
:context:
118128

119129
import matplotlib.cbook as cbook
120-
datafile = cbook.get_sample_data('goog.npy')
121-
try:
122-
# Python3 cannot load python2 .npy files with datetime(object) arrays
123-
# unless the encoding is set to bytes. Hovever this option was
124-
# not added until numpy 1.10 so this example will only work with
125-
# python 2 or with numpy 1.10 and later.
126-
r = np.load(datafile, encoding='bytes').view(np.recarray)
127-
except TypeError:
128-
# Old Numpy
129-
r = np.load(datafile).view(np.recarray)
130+
with cbook.get_sample_data('goog.npz') as datafile:
131+
r = np.load(datafile)['price_data'].view(np.recarray)
132+
# Matplotlib prefers datetime instead of np.datetime64.
133+
date = r.date.astype('O')
130134
plt.figure()
131-
plt.plot(r.date, r.close)
135+
plt.plot(date, r.close)
132136
plt.title('Default date handling can cause overlapping labels')
133137

134138
Another annoyance is that if you hover the mouse over the window and
@@ -149,7 +153,7 @@ a number of date formatters built in, so we'll use one of those.
149153

150154
plt.close('all')
151155
fig, ax = plt.subplots(1)
152-
ax.plot(r.date, r.close)
156+
ax.plot(date, r.close)
153157

154158
# rotate and align the tick labels so they look better
155159
fig.autofmt_xdate()
@@ -186,22 +190,17 @@ right.
186190
import matplotlib.cbook as cbook
187191

188192
# load up some sample financial data
189-
datafile = cbook.get_sample_data('goog.npy')
190-
try:
191-
# Python3 cannot load python2 .npy files with datetime(object) arrays
192-
# unless the encoding is set to bytes. Hovever this option was
193-
# not added until numpy 1.10 so this example will only work with
194-
# python 2 or with numpy 1.10 and later.
195-
r = np.load(datafile, encoding='bytes').view(np.recarray)
196-
except TypeError:
197-
r = np.load(datafile).view(np.recarray)
193+
with cbook.get_sample_data('goog.npz') as datafile:
194+
r = np.load(datafile)['price_data'].view(np.recarray)
195+
# Matplotlib prefers datetime instead of np.datetime64.
196+
date = r.date.astype('O')
198197
# create two subplots with the shared x and y axes
199198
fig, (ax1, ax2) = plt.subplots(1,2, sharex=True, sharey=True)
200199

201200
pricemin = r.close.min()
202201

203-
ax1.plot(r.date, r.close, lw=2)
204-
ax2.fill_between(r.date, pricemin, r.close, facecolor='blue', alpha=0.5)
202+
ax1.plot(date, r.close, lw=2)
203+
ax2.fill_between(date, pricemin, r.close, facecolor='blue', alpha=0.5)
205204

206205
for ax in ax1, ax2:
207206
ax.grid(True)

examples/api/date_demo.py

Lines changed: 11 additions & 16 deletions
Original file line numberDiff line numberDiff line change
@@ -24,31 +24,26 @@
2424
months = mdates.MonthLocator() # every month
2525
yearsFmt = mdates.DateFormatter('%Y')
2626

27-
# load a numpy record array from yahoo csv data with fields date,
28-
# open, close, volume, adj_close from the mpl-data/example directory.
29-
# The record array stores python datetime.date as an object array in
30-
# the date column
31-
datafile = cbook.get_sample_data('goog.npy')
32-
try:
33-
# Python3 cannot load python2 .npy files with datetime(object) arrays
34-
# unless the encoding is set to bytes. However this option was
35-
# not added until numpy 1.10 so this example will only work with
36-
# python 2 or with numpy 1.10 and later.
37-
r = np.load(datafile, encoding='bytes').view(np.recarray)
38-
except TypeError:
39-
r = np.load(datafile).view(np.recarray)
27+
# Load a numpy record array from yahoo csv data with fields date, open, close,
28+
# volume, adj_close from the mpl-data/example directory. The record array
29+
# stores the date as an np.datetime64 with a day unit ('D') in the date column.
30+
with cbook.get_sample_data('goog.npz') as datafile:
31+
r = np.load(datafile)['price_data'].view(np.recarray)
32+
# Matplotlib works better with datetime.datetime than np.datetime64, but the
33+
# latter is more portable.
34+
date = r.date.astype('O')
4035

4136
fig, ax = plt.subplots()
42-
ax.plot(r.date, r.adj_close)
37+
ax.plot(date, r.adj_close)
4338

4439

4540
# format the ticks
4641
ax.xaxis.set_major_locator(years)
4742
ax.xaxis.set_major_formatter(yearsFmt)
4843
ax.xaxis.set_minor_locator(months)
4944

50-
datemin = datetime.date(r.date.min().year, 1, 1)
51-
datemax = datetime.date(r.date.max().year + 1, 1, 1)
45+
datemin = datetime.date(date.min().year, 1, 1)
46+
datemax = datetime.date(date.max().year + 1, 1, 1)
5247
ax.set_xlim(datemin, datemax)
5348

5449

examples/api/date_index_formatter.py

Lines changed: 10 additions & 9 deletions
Original file line numberDiff line numberDiff line change
@@ -10,22 +10,23 @@
1010
from __future__ import print_function
1111
import numpy as np
1212
import matplotlib.pyplot as plt
13-
import matplotlib.mlab as mlab
1413
import matplotlib.cbook as cbook
1514
import matplotlib.ticker as ticker
1615

17-
datafile = cbook.get_sample_data('aapl.csv', asfileobj=False)
18-
print('loading %s' % datafile)
19-
r = mlab.csv2rec(datafile)
20-
21-
r.sort()
16+
# Load a numpy record array from yahoo csv data with fields date, open, close,
17+
# volume, adj_close from the mpl-data/example directory. The record array
18+
# stores the date as an np.datetime64 with a day unit ('D') in the date column.
19+
with cbook.get_sample_data('goog.npz') as datafile:
20+
r = np.load(datafile)['price_data'].view(np.recarray)
2221
r = r[-30:] # get the last 30 days
23-
22+
# Matplotlib works better with datetime.datetime than np.datetime64, but the
23+
# latter is more portable.
24+
date = r.date.astype('O')
2425

2526
# first we'll do it the default way, with gaps on weekends
2627
fig, axes = plt.subplots(ncols=2, figsize=(8, 4))
2728
ax = axes[0]
28-
ax.plot(r.date, r.adj_close, 'o-')
29+
ax.plot(date, r.adj_close, 'o-')
2930
ax.set_title("Default")
3031
fig.autofmt_xdate()
3132

@@ -36,7 +37,7 @@
3637

3738
def format_date(x, pos=None):
3839
thisind = np.clip(int(x + 0.5), 0, N - 1)
39-
return r.date[thisind].strftime('%Y-%m-%d')
40+
return date[thisind].strftime('%Y-%m-%d')
4041

4142
ax = axes[1]
4243
ax.plot(ind, r.adj_close, 'o-')

examples/misc/rec_groupby_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import matplotlib.mlab as mlab
44
import matplotlib.cbook as cbook
55

6-
datafile = cbook.get_sample_data('aapl.csv', asfileobj=False)
6+
datafile = cbook.get_sample_data('msft.csv', asfileobj=False)
77
print('loading', datafile)
88
r = mlab.csv2rec(datafile)
99
r.sort()

examples/misc/rec_join_demo.py

Lines changed: 1 addition & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -3,7 +3,7 @@
33
import matplotlib.mlab as mlab
44
import matplotlib.cbook as cbook
55

6-
datafile = cbook.get_sample_data('aapl.csv', asfileobj=False)
6+
datafile = cbook.get_sample_data('msft.csv', asfileobj=False)
77
print('loading', datafile)
88
r = mlab.csv2rec(datafile)
99

examples/pylab_examples/centered_ticklabels.py

Lines changed: 7 additions & 12 deletions
Original file line numberDiff line numberDiff line change
@@ -19,20 +19,15 @@
1919
import matplotlib.pyplot as plt
2020

2121
# load some financial data; apple's stock price
22-
fh = cbook.get_sample_data('aapl.npy.gz')
23-
try:
24-
# Python3 cannot load python2 .npy files with datetime(object) arrays
25-
# unless the encoding is set to bytes. However this option was
26-
# not added until numpy 1.10 so this example will only work with
27-
# python 2 or with numpy 1.10 and later.
28-
r = np.load(fh, encoding='bytes')
29-
except TypeError:
30-
r = np.load(fh)
31-
fh.close()
22+
with cbook.get_sample_data('aapl.npz') as fh:
23+
r = np.load(fh)['price_data'].view(np.recarray)
3224
r = r[-250:] # get the last 250 days
25+
# Matplotlib works better with datetime.datetime than np.datetime64, but the
26+
# latter is more portable.
27+
date = r.date.astype('O')
3328

3429
fig, ax = plt.subplots()
35-
ax.plot(r.date, r.adj_close)
30+
ax.plot(date, r.adj_close)
3631

3732
ax.xaxis.set_major_locator(dates.MonthLocator())
3833
ax.xaxis.set_minor_locator(dates.MonthLocator(bymonthday=15))
@@ -46,5 +41,5 @@
4641
tick.label1.set_horizontalalignment('center')
4742

4843
imid = len(r)//2
49-
ax.set_xlabel(str(r.date[imid].year))
44+
ax.set_xlabel(str(date[imid].year))
5045
plt.show()

examples/pylab_examples/scatter_demo2.py

Lines changed: 5 additions & 13 deletions
Original file line numberDiff line numberDiff line change
@@ -5,19 +5,11 @@
55
import matplotlib.pyplot as plt
66
import matplotlib.cbook as cbook
77

8-
# Load a numpy record array from yahoo csv data with fields date,
9-
# open, close, volume, adj_close from the mpl-data/example directory.
10-
# The record array stores python datetime.date as an object array in
11-
# the date column
12-
datafile = cbook.get_sample_data('goog.npy')
13-
try:
14-
# Python3 cannot load python2 .npy files with datetime(object) arrays
15-
# unless the encoding is set to bytes. However this option was
16-
# not added until numpy 1.10 so this example will only work with
17-
# python 2 or with numpy 1.10 and later
18-
price_data = np.load(datafile, encoding='bytes').view(np.recarray)
19-
except TypeError:
20-
price_data = np.load(datafile).view(np.recarray)
8+
# Load a numpy record array from yahoo csv data with fields date, open, close,
9+
# volume, adj_close from the mpl-data/example directory. The record array
10+
# stores the date as an np.datetime64 with a day unit ('D') in the date column.
11+
with cbook.get_sample_data('goog.npz') as datafile:
12+
price_data = np.load(datafile)['price_data'].view(np.recarray)
2113
price_data = price_data[-250:] # get the most recent 250 trading days
2214

2315
delta1 = np.diff(price_data.adj_close)/price_data.adj_close[:-1]
Binary file not shown.
-85.6 KB
Binary file not shown.

0 commit comments

Comments
 (0)