Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Cannot access file stored in MinIO through s3fs #482

@tgaddair

Description

@tgaddair

What happened:

Using the latest version of s3fs, we get the following error when attempting to open a remote file:

Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.7/site-packages/s3fs/core.py", line 233, in _call_s3
    out = await method(**additional_kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/aiobotocore/client.py", line 154, in _make_api_call
    raise error_class(parsed_response, operation_name)
botocore.exceptions.ClientError: An error occurred (PreconditionFailed) when calling the GetObject operation: At least one of the pre-conditions you specified did not hold

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  File "/home/ray/anaconda3/lib/python3.7/site-packages/s3fs/core.py", line 1834, in _fetch_range
    req_kw=self.req_kw,
  File "/home/ray/anaconda3/lib/python3.7/site-packages/s3fs/core.py", line 1975, in _fetch_range
    **req_kw,
  File "/home/ray/anaconda3/lib/python3.7/site-packages/fsspec/asyn.py", line 72, in wrapper
    return sync(self.loop, func, *args, **kwargs)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/fsspec/asyn.py", line 53, in sync
    raise result[0]
  File "/home/ray/anaconda3/lib/python3.7/site-packages/fsspec/asyn.py", line 20, in _runner
    result[0] = await coro
  File "/home/ray/anaconda3/lib/python3.7/site-packages/s3fs/core.py", line 252, in _call_s3
    raise translate_boto_error(err)
OSError: [Errno 22] At least one of the pre-conditions you specified did not hold

The above exception was the direct cause of the following exception:

Traceback (most recent call last):
  <redacted>
  File "/home/ray/anaconda3/lib/python3.7/site-packages/ludwig/utils/data_utils.py", line 109, in read_xsv
    dialect = csv.Sniffer().sniff(csvfile.read(1024 * 100),
  File "/home/ray/anaconda3/lib/python3.7/site-packages/fsspec/spec.py", line 1449, in read
    out = self.cache._fetch(self.loc, self.loc + length)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/fsspec/caching.py", line 376, in _fetch
    self.cache = self.fetcher(start, bend)
  File "/home/ray/anaconda3/lib/python3.7/site-packages/s3fs/core.py", line 1841, in _fetch_range
    ) from ex
s3fs.utils.FileExpired: [Errno 16] The remote file corresponding to filename <redacted> and Etag "<redacted>" no longer exists.

The file is being opened through the fsspec entrypoint:

of = fsspec.open(url, **storage_options)
with of as f:
    ...

The twist here is that the URL is given as s3, but the file is actually stored in Azure Blob Storage. We're using the MinIO Azure Gateway to expose an S3-compatible layer over Azure Blob Storage.

Storage options are fairly straightforward for MinIO:

storage_options = {
        'endpointUrl': 'http://localhost:9000', 
        'awsAccessKeyId': '...', 
        'awsSecretAccessKey': '...'
}

What you expected to happen:

Using s3fs==2021.4.0, this operation works fine.

Minimal Complete Verifiable Example:

Setup MinIO and then provide an s3 path to a bucket/object in MinIO:

url = 's3://bucket/object'
of = fsspec.open(url, **storage_options)
with of as f:
    print(f.read())

Anything else we need to know?:

Environment:

  • Dask version: 2021.5.0
  • Python version: 3.7.7
  • Operating System: Linux (Ubuntu)
  • Install method (conda, pip, source): pip

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions