Here are a collection of notes about things we want to accomplish with
the testing infrastructure.

matplotlib.testing.image_comparison decorator
=============================================

Collect all the failures in the image_comparison() decorator and raise
one failure that describes all the failed images. Right now the loop
that iterates over the images raises an exception on the first
failure, which clearly breaks out of the loop.

Put image comparison results into more hierarchical view to facilitate
knowing which test module creates which image. (Also, revive John's
move_good.py script to deal with new image locations.)

Misc testing stuff
==================

Make ImageComparisonFailure cause nose to report "fail" (rather than
"error").

Recreate John's move_good.py script once the better hierarchy of test
result images is established.

Buildbot infrastructure
=======================

Put "actual" images online from buildbots, even without failure.

Get a Windows XP 32-bit build slave.

Build nightly binaries for win32 and Mac OS X.

Build nightly source tarballs.

Build nightly docs and put online somewhere.

A note about the hack that is the failed image gathering mechanism
==================================================================

The trouble with the current actual/baseline/diff result gathering
mechanism is that it uses the filesystem as a means for communication
withing the nose test running process in addition to communication
with the buildbot process through hard-coded assumptions about paths
and filenames. If the only concern was within nose, we could
presumably re-work some the old MplNoseTester plugin to handle the new
case, but given the buildbot consideration it gets more difficult to
get these frameworks talking through supported API calls. Thus,
although the hardcoded path and filename stuff is a hack, it will
require some serious nose and buildbot learning to figure out how to
do it the "right" way. So I'm all for sticking with the hack right
now, and making a bit nicer by doing things like having a better
directory hierarchy layout for the actual result images.