Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Intermittent HTTP 403 on fetch_california_housing and other Figshare hosted data on Azure CI #30761

Closed
@lesteve

Description

@lesteve

Already noticed in #30636 (comment).

This seems to happen from time to time in doctests (build log) or in other places (build log)

Error in doctests
=================================== FAILURES ===================================
�[31m�[1m________________________ [doctest] getting_started.rst _________________________�[0m
167 the best set of parameters. Read more in the :ref:`User Guide
168 <grid_search>`::
169 
170   >>> from sklearn.datasets import fetch_california_housing
171   >>> from sklearn.ensemble import RandomForestRegressor
172   >>> from sklearn.model_selection import RandomizedSearchCV
173   >>> from sklearn.model_selection import train_test_split
174   >>> from scipy.stats import randint
175   ...
176   >>> X, y = fetch_california_housing(return_X_y=True)
UNEXPECTED EXCEPTION: <HTTPError 403: 'Forbidden'>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/doctest.py", line 1395, in __run
    exec(compile(example.source, filename, "single",
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 compileflags, True), test.globs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<doctest getting_started.rst[33]>", line 1, in <module>
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/utils/_param_validation.py", line 218, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/datasets/_california_housing.py", line 177, in fetch_california_housing
    archive_path = _fetch_remote(
        ARCHIVE,
    ...<2 lines>...
        delay=delay,
    )
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/datasets/_base.py", line 1513, in _fetch_remote
    urlretrieve(remote.url, temp_file_path)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 214, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
                            ~~~~~~~^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 189, in urlopen
    return opener.open(url, data, timeout)
           ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 495, in open
    response = meth(req, response)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 604, in http_response
    response = self.parent.error(
        'http', request, response, code, msg, hdrs)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 533, in error
    return self._call_chain(*args)
           ~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 466, in _call_chain
    result = func(*args)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 613, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
�[1m�[31m/home/vsts/work/1/s/doc/getting_started.rst�[0m:176: UnexpectedException
�[31m�[1m____________________________ [doctest] compose.rst _____________________________�[0m
285 the regressor that will be used for prediction, and the transformer that will
286 be applied to the target variable::
287 
288   >>> import numpy as np
289   >>> from sklearn.datasets import fetch_california_housing
290   >>> from sklearn.compose import TransformedTargetRegressor
291   >>> from sklearn.preprocessing import QuantileTransformer
292   >>> from sklearn.linear_model import LinearRegression
293   >>> from sklearn.model_selection import train_test_split
294   >>> X, y = fetch_california_housing(return_X_y=True)
UNEXPECTED EXCEPTION: <HTTPError 403: 'Forbidden'>
Traceback (most recent call last):
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/doctest.py", line 1395, in __run
    exec(compile(example.source, filename, "single",
    ~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
                 compileflags, True), test.globs)
                 ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "<doctest compose.rst[59]>", line 1, in <module>
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/utils/_param_validation.py", line 218, in wrapper
    return func(*args, **kwargs)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/datasets/_california_housing.py", line 177, in fetch_california_housing
    archive_path = _fetch_remote(
        ARCHIVE,
    ...<2 lines>...
        delay=delay,
    )
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/sklearn/datasets/_base.py", line 1513, in _fetch_remote
    urlretrieve(remote.url, temp_file_path)
    ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 214, in urlretrieve
    with contextlib.closing(urlopen(url, data)) as fp:
                            ~~~~~~~^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 189, in urlopen
    return opener.open(url, data, timeout)
           ~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 495, in open
    response = meth(req, response)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 604, in http_response
    response = self.parent.error(
        'http', request, response, code, msg, hdrs)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 533, in error
    return self._call_chain(*args)
           ~~~~~~~~~~~~~~~~^^^^^^^
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 466, in _call_chain
    result = func(*args)
  File "/usr/share/miniconda/envs/testvenv/lib/python3.13/urllib/request.py", line 613, in http_error_default
    raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 403: Forbidden
�[1m�[31m/home/vsts/work/1/s/doc/modules/compose.rst�[0m:294: UnexpectedException
�[36m�[1m=========================== short test summary info ============================�[0m
�[31mFAILED�[0m ../1/s/doc/getting_started.rst::�[1mgetting_started.rst�[0m
�[31mFAILED�[0m ../1/s/doc/modules/compose.rst::�[1mcompose.rst�[0m
�[31m======= �[31m�[1m2 failed�[0m, �[32m39 passed�[0m, �[33m2 skipped�[0m, �[33m39 warnings�[0m�[31m in 86.94s (0:01:26)�[0m�[31m ========�[0m
Internal Pytest error (error in conftest.py when downloading all the datasets)
============================= test session starts ==============================
platform linux -- Python 3.13.1, pytest-8.3.4, pluggy-1.5.0
rootdir: /home/vsts/work/tmp_folder
configfile: setup.cfg
plugins: scipy_doctest-1.6, cov-6.0.0, xdist-3.6.1
created: 2/2 workers
2 workers [38211 items]

INTERNALERROR> def worker_internal_error(
INTERNALERROR>         self, node: WorkerController, formatted_error: str
INTERNALERROR>     ) -> None:
INTERNALERROR>         """
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/_pytest/main.py", line 283, in wrap_session
INTERNALERROR>     session.exitstatus = doit(config, session) or 0
INTERNALERROR>                          ~~~~^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/_pytest/main.py", line 337, in _main
INTERNALERROR>     config.hook.pytest_runtestloop(session=session)
INTERNALERROR>     ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_hooks.py", line 513, in __call__
INTERNALERROR>     return self._hookexec(self.name, self._hookimpls.copy(), kwargs, firstresult)
INTERNALERROR>            ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_manager.py", line 120, in _hookexec
INTERNALERROR>     return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
INTERNALERROR>            ~~~~~~~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_callers.py", line 182, in _multicall
INTERNALERROR>     return outcome.get_result()
INTERNALERROR>            ~~~~~~~~~~~~~~~~~~^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_result.py", line 100, in get_result
INTERNALERROR>     raise exc.with_traceback(exc.__traceback__)
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_callers.py", line 167, in _multicall
INTERNALERROR>     teardown.throw(outcome._exception)
INTERNALERROR>     ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/_pytest/logging.py", line 803, in pytest_runtestloop
INTERNALERROR>     return (yield)  # Run all the tests.
INTERNALERROR>             ^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_callers.py", line 167, in _multicall
INTERNALERROR>     teardown.throw(outcome._exception)
INTERNALERROR>     ~~~~~~~~~~~~~~^^^^^^^^^^^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/_pytest/terminal.py", line 673, in pytest_runtestloop
INTERNALERROR>     result = yield
INTERNALERROR>              ^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/pluggy/_callers.py", line 103, in _multicall
INTERNALERROR>     res = hook_impl.function(*args)
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/xdist/dsession.py", line 138, in pytest_runtestloop
INTERNALERROR>     self.loop_once()
INTERNALERROR>     ~~~~~~~~~~~~~~^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/xdist/dsession.py", line 163, in loop_once
INTERNALERROR>     call(**kwargs)
INTERNALERROR>     ~~~~^^^^^^^^^^
INTERNALERROR>   File "/usr/share/miniconda/envs/testvenv/lib/python3.13/site-packages/xdist/dsession.py", line 218, in worker_workerfinished
INTERNALERROR>     self._active_nodes.remove(node)
INTERNALERROR>     ~~~~~~~~~~~~~~~~~~~~~~~~~^^^^^^
INTERNALERROR> KeyError: <WorkerController gw0>

I don't think this happens often enough to bother us right now but if it starts happening more often we should contact Figshare support and tell them.

I did something similar last time this was happening (on Colab and Kaggle notebooks) #28297 (comment) and in the end they fixed it.

My guess is that this is somehow triggering an anti-abuse mechanism ...

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions