Description
import sklearn
creates a StreamHandler
and attaches it to the sklearn
logger:
scikit-learn/sklearn/__init__.py
Line 24 in 0eebade
I'm not sure what the motivation for this is, but it's a deviation from the normal "best practices" for logging, namely that libraries should restrict themselves to issuing log messages, but let the application do all logging configuration (setting up handlers, changing logger levels, and the like). There's lots written about this elsewhere, but here's one relevant blog post: http://pieces.openpolitics.com/2012/04/python-logging-best-practices/
In practice, this caused a hard-to-diagnose bug in our IPython- and sklearn-using application (actually, in more than one such application):
- At application start time, we start an IPython kernel. That kernel swaps out
sys.stdout
andsys.stderr
for its own custom streams, which rely on a lot of fairly complicated machinery (extra threads, ZMQ streams, the asyncio event loop, etc.) sklearn
was imported while that IPython kernel was running.- The log handler created at import time then picked up IPython's custom
sys.stderr
stream instead of the usual one. - At application stop time, the IPython kernel and associated machinery were stopped.
- At process exit time, the stream associated to the handler was flushed (by the
logging
module'sshutdown
function, which is registered as anatexit
handler). Because the IPython machinery was no longer active, we got a hard-to-understand traceback.
If the intent of the handler is to suppress the "No logger configured ..." messages from the std. lib., perhaps a logging.NullHandler
could be used for that purpose instead? I'm happy to create a PR for this if the proposed change sounds acceptable.