S3 artifact store: fix path resolution error when artifact root is bucket root #928

dbczumar · 2019-02-26T22:56:43Z

When attempting to download a directory artifact from an S3-based artifact repository, paths are truncated if the repository's artifact URI is the URI of the S3 bucket root. This PR fixes the issue by using relpath() rather than filename slicing.

The cause of failure is identical to the one that affected in the Azure store and was addressed by #769.

…ct uri is bucket root

dbczumar · 2019-02-26T23:00:53Z

mlflow/store/s3_artifact_repo.py

-                infos.append(FileInfo(name, False, size))
+                file_path = obj.get("Key")
+                if not file_path.startswith(artifact_path):
+                    raise ValueError(


It is somewhat challenging to test that this is raised because the S3 client paginator is difficult to mock. If we think that this test case is particularly important, I can try to figure something out.

mparkhe · 2019-02-27T01:25:39Z

mlflow/store/s3_artifact_repo.py

-                infos.append(FileInfo(subdir, True, None))
+                subdir_path = obj.get("Prefix")
+                if not subdir_path.startswith(artifact_path):
+                    raise ValueError(


nit
Why raise a ValueError? can we raise MlflowException

Changed to MlflowException!

mparkhe · 2019-02-27T01:29:05Z

mlflow/store/s3_artifact_repo.py

+                if not file_path.startswith(artifact_path):
+                    raise ValueError(
+                        "The path of the listed S3 file does not begin with the specified"
+                        " artifact path. Artifact path: {artifact_path}. File path:"


Is there any way this duplicated code can be unified? I know this is extracted differently - one is a file vs directory. In fact, if there is a way share some of this between different blob stores -- would be awesome .... so any changes to internal methods don't require changes all over the place.

For now, I've deduped the code inside S3ArtifactRepository by defining a static _verify_listed_object_contains_artifact_path_prefix() method. It may be reasonable to create an abstract class for blob stores that follow this pattern, but it seems we agree that this may be a larger undertaking that requires a followup PR.

mparkhe · 2019-02-27T01:32:49Z

mlflow/store/s3_artifact_repo.py

        s3_client = self._get_s3_client()
        paginator = s3_client.get_paginator("list_objects_v2")
        results = paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/')
        for result in results:


Let's write some sort of a checker to make sure that nesting is legit. For instance it is possible to have this sort of directory structure in S3

dir_name # is a true directory dir_name/sub_dir # this one is a file dir_name/sub_dir/file # this is also a file

After offline discussion, it appears that we don't currently have a mechanism for checking / enforcing that S3 object keys containing slashes are directories. We should definitely agree on a strategy for dealing with files whose keys contain slashes, but this is a bit outside the scope of the current PR.

mparkhe · 2019-02-27T18:59:19Z

You'll have to resolve conflict. But the changes LGTM. Thanks for deduping the code and migrating exception raised.

dbczumar added 2 commits February 26, 2019 13:14

Use relpath instead of slicing behavior to fix downloads where artifa…

34bd127

…ct uri is bucket root

S3 artifact repo download from root fix

f885952

dbczumar requested a review from mparkhe February 26, 2019 22:56

dbczumar commented Feb 26, 2019

View reviewed changes

mparkhe reviewed Feb 27, 2019

View reviewed changes

dbczumar added 2 commits February 26, 2019 18:14

Address comments

7802569

Lint

85e9c60

Merge master and fix conflicts

61ba8cf

dbczumar merged commit 6d07ede into mlflow:master Feb 28, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

S3 artifact store: fix path resolution error when artifact root is bucket root #928

S3 artifact store: fix path resolution error when artifact root is bucket root #928

Uh oh!

dbczumar commented Feb 26, 2019

Uh oh!

dbczumar Feb 26, 2019

Uh oh!

mparkhe Feb 27, 2019

Uh oh!

dbczumar Feb 27, 2019

Uh oh!

mparkhe Feb 27, 2019

Uh oh!

dbczumar Feb 27, 2019

Uh oh!

mparkhe Feb 27, 2019

Uh oh!

dbczumar Feb 27, 2019

Uh oh!

mparkhe commented Feb 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

S3 artifact store: fix path resolution error when artifact root is bucket root #928

S3 artifact store: fix path resolution error when artifact root is bucket root #928

Uh oh!

Conversation

dbczumar commented Feb 26, 2019

Uh oh!

dbczumar Feb 26, 2019

Choose a reason for hiding this comment

Uh oh!

mparkhe Feb 27, 2019

Choose a reason for hiding this comment

Uh oh!

dbczumar Feb 27, 2019

Choose a reason for hiding this comment

Uh oh!

mparkhe Feb 27, 2019

Choose a reason for hiding this comment

Uh oh!

dbczumar Feb 27, 2019

Choose a reason for hiding this comment

Uh oh!

mparkhe Feb 27, 2019

Choose a reason for hiding this comment

Uh oh!

dbczumar Feb 27, 2019

Choose a reason for hiding this comment

Uh oh!

mparkhe commented Feb 27, 2019

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants