Thanks to visit codestin.com
Credit goes to github.com

Skip to content

AssertionError when enabling autolog for sklearn with mlflow #26992

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
NusretOzates opened this issue Aug 2, 2023 · 4 comments
Closed

AssertionError when enabling autolog for sklearn with mlflow #26992

NusretOzates opened this issue Aug 2, 2023 · 4 comments
Labels

Comments

@NusretOzates
Copy link

NusretOzates commented Aug 2, 2023

Describe the bug

I came here from this issue: mlflow/mlflow#9173

MLflow autologging encountered a warning: "/home/nusret/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/_distutils_hack/__init__.py:18: UserWarning: Distutils was imported before Setuptools, but importing Setuptools also replaces the distutilsmodule insys.modules. This may lead to undesirable behaviors or errors. To avoid these issues, avoid using distutils directly, ensure that setuptools is installed in the traditional way (e.g. not an editable install), and/or make sure that setuptools is always imported before distutils."

And by looking at the Actual Results part, it seems like the source of the error is scikit-learn

Steps/Code to Reproduce

import mlflow
from sklearn.cluster import KMeans
from sklearn.datasets import load_iris

model = KMeans()
iris = load_iris()
X = iris.data[:, :2]
y = iris.target

with mlflow.start_run():
    model.fit(X, y)
    mlflow.sklearn.log_model(model, "model")

mlflow.sklearn.autolog()

Expected Results

No error is thrown

Actual Results

---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
Cell In[20], line 4
      1 from sklearn.ensemble import RandomForestRegressor, GradientBoostingRegressor, ExtraTreesRegressor
      2 from sklearn.svm import LinearSVR
----> 4 mlflow.sklearn.autolog()
      6 for model_class in (GradientBoostingRegressor,LinearSVR, RandomForestRegressor, ExtraTreesRegressor ):
      8     with mlflow.start_run():

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/mlflow/utils/autologging_utils/__init__.py:424, in autologging_integration.<locals>.wrapper.<locals>.autolog(*args, **kwargs)
    405 with set_mlflow_events_and_warnings_behavior_globally(
    406     # MLflow warnings emitted during autologging setup / enablement are likely
    407     # actionable and relevant to the user, so they should be emitted as normal
   (...)
    420     disable_warnings=is_silent_mode,
    421 ):
    422     _check_and_log_warning_for_unsupported_package_versions(name)
--> 424     return _autolog(*args, **kwargs)

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/mlflow/sklearn/__init__.py:1210, in autolog(log_input_examples, log_model_signatures, log_models, log_datasets, disable, exclusive, disable_for_unsupported_versions, silent, max_tuning_runs, log_post_training_metrics, serialization_format, registered_model_name, pos_label)
    927 @autologging_integration(FLAVOR_NAME)
    928 def autolog(
    929     log_input_examples=False,
   (...)
    941     pos_label=None,
    942 ):  # pylint: disable=unused-argument
    943     """
    944     Enables (or disables) and configures autologging for scikit-learn estimators.
    945 
   (...)
   1208                       be logged. If used for regression model, the parameter will be ignored.
   1209     """
-> 1210     _autolog(
   1211         flavor_name=FLAVOR_NAME,
   1212         log_input_examples=log_input_examples,
   1213         log_model_signatures=log_model_signatures,
   1214         log_models=log_models,
   1215         log_datasets=log_datasets,
   1216         disable=disable,
   1217         exclusive=exclusive,
   1218         disable_for_unsupported_versions=disable_for_unsupported_versions,
   1219         silent=silent,
   1220         max_tuning_runs=max_tuning_runs,
   1221         log_post_training_metrics=log_post_training_metrics,
   1222         serialization_format=serialization_format,
   1223         pos_label=pos_label,
   1224     )

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/mlflow/sklearn/__init__.py:1773, in _autolog(flavor_name, log_input_examples, log_model_signatures, log_models, log_datasets, disable, exclusive, disable_for_unsupported_versions, silent, max_tuning_runs, log_post_training_metrics, serialization_format, pos_label)
   1771     allow_children_patch = True
   1772 else:
-> 1773     estimators_to_patch = _gen_estimators_to_patch()
   1774     patched_fit_impl = fit_mlflow
   1775     allow_children_patch = False

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/mlflow/sklearn/__init__.py:97, in _gen_estimators_to_patch()
     91 def _gen_estimators_to_patch():
     92     from mlflow.sklearn.utils import (
     93         _all_estimators,
     94         _get_meta_estimators_for_autologging,
     95     )
---> 97     _, estimators_to_patch = zip(*_all_estimators())
     98     # Ensure that relevant meta estimators (e.g. GridSearchCV, Pipeline) are selected
     99     # for patching if they are not already included in the output of `all_estimators()`
    100     estimators_to_patch = set(estimators_to_patch).union(
    101         set(_get_meta_estimators_for_autologging())
    102     )

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/mlflow/sklearn/utils.py:856, in _all_estimators()
    853 try:
    854     from sklearn.utils import all_estimators
--> 856     return all_estimators()
    857 except ImportError:
    858     return _backported_all_estimators()

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/sklearn/utils/discovery.py:63, in all_estimators(type_filter)
     60 # Ignore deprecation warnings triggered at import time and from walking
     61 # packages
     62 with ignore_warnings(category=FutureWarning):
---> 63     for _, module_name, _ in pkgutil.walk_packages(path=[root], prefix="sklearn."):
     64         module_parts = module_name.split(".")
     65         if (
     66             any(part in _MODULE_TO_IGNORE for part in module_parts)
     67             or "._" in module_name
     68         ):

File ~/miniconda3/envs/mlflow_training/lib/python3.11/pkgutil.py:92, in walk_packages(path, prefix, onerror)
     90 if info.ispkg:
     91     try:
---> 92         __import__(info.name)
     93     except ImportError:
     94         if onerror is not None:

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/sklearn/_build_utils/__init__.py:15
     13 from .._min_dependencies import CYTHON_MIN_VERSION
     14 from ..externals._packaging.version import parse
---> 15 from .openmp_helpers import check_openmp_support
     16 from .pre_build_helpers import basic_check_build
     18 DEFAULT_ROOT = "sklearn"

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/sklearn/_build_utils/openmp_helpers.py:12
      9 import textwrap
     10 import warnings
---> 12 from .pre_build_helpers import compile_test_program
     15 def get_openmp_flag():
     16     if sys.platform == "win32":

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/sklearn/_build_utils/pre_build_helpers.py:10
      7 import tempfile
      8 import textwrap
---> 10 from setuptools.command.build_ext import customize_compiler, new_compiler
     13 def compile_test_program(code, extra_preargs=None, extra_postargs=None):
     14     """Check that some C code can be compiled and run"""

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/setuptools/__init__.py:7
      4 import os
      5 import re
----> 7 import _distutils_hack.override  # noqa: F401
      9 import distutils.core
     10 from distutils.errors import DistutilsOptionError

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/_distutils_hack/override.py:1
----> 1 __import__('_distutils_hack').do_override()

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/_distutils_hack/__init__.py:77, in do_override()
     75 if enabled():
     76     warn_distutils_present()
---> 77     ensure_local_distutils()

File ~/miniconda3/envs/mlflow_training/lib/python3.11/site-packages/_distutils_hack/__init__.py:64, in ensure_local_distutils()
     62 # check that submodules load as expected
     63 core = importlib.import_module('distutils.core')
---> 64 assert '_distutils' in core.__file__, core.__file__
     65 assert 'setuptools._distutils.log' not in sys.modules

AssertionError: /home/nusret/miniconda3/envs/mlflow_training/lib/python3.11/distutils/core.py

Versions

1.3.0
@NusretOzates NusretOzates added Bug Needs Triage Issue requires triage labels Aug 2, 2023
@glemaitre glemaitre changed the title [BUG] AssertionError when enabling autolog for sklearn with mlflow AssertionError when enabling autolog for sklearn with mlflow Aug 2, 2023
@glemaitre
Copy link
Member

glemaitre commented Aug 2, 2023

Never saw this one. I assume that the walk_packages should not investigate _build_utils and some of private modules that do not contain estimators anyway.

@betatim
Copy link
Member

betatim commented Aug 3, 2023

I"m not sure if this is a problem in scikit-learn or a problem that comes about when mlfow uses scikit-learn. The assertion is raised because distutils was already imported when scikit-learn attempts to import setuptools. When you directly run the scikit-learn code:

from sklearn.utils import all_estimators
estimators = all_estimators()

(I think this is the relevant bit that mlflow calls) then no warning is raised. As far as I can tell distutils isn't (directly) imported or referenced in the scikit-learn code base. So my guess is that mlflow (or some other library) imports it, before the scikit-learn all_estiamtors gets called.

Unfortunately I have no good idea for instructions on how to track down who/what imports distutils :-/

@Micky774 Micky774 removed the Needs Triage Issue requires triage label Aug 11, 2023
@thomasjpfan
Copy link
Member

From mlflow/mlflow#9173 (comment), the original issue was resolved when setuptools was imported first. In our case, I agree with #26992 (comment) in that we should not walk private modules if we do not need to.

I also find it strange that all_estimators ends up importing setuptools when we do not list setuptools as a runtime dependency. (I know that setuptools is usually installed, but I prefer not to import it during runtime).

@lorentzenchr
Copy link
Member

This seems solved.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants