Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Conversation

@dbczumar
Copy link
Collaborator

When attempting to download a directory artifact from an S3-based artifact repository, paths are truncated if the repository's artifact URI is the URI of the S3 bucket root. This PR fixes the issue by using relpath() rather than filename slicing.

The cause of failure is identical to the one that affected in the Azure store and was addressed by #769.

@dbczumar dbczumar requested a review from mparkhe February 26, 2019 22:56
infos.append(FileInfo(name, False, size))
file_path = obj.get("Key")
if not file_path.startswith(artifact_path):
raise ValueError(
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is somewhat challenging to test that this is raised because the S3 client paginator is difficult to mock. If we think that this test case is particularly important, I can try to figure something out.

infos.append(FileInfo(subdir, True, None))
subdir_path = obj.get("Prefix")
if not subdir_path.startswith(artifact_path):
raise ValueError(
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit
Why raise a ValueError? can we raise MlflowException

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Changed to MlflowException!

if not file_path.startswith(artifact_path):
raise ValueError(
"The path of the listed S3 file does not begin with the specified"
" artifact path. Artifact path: {artifact_path}. File path:"
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is there any way this duplicated code can be unified? I know this is extracted differently - one is a file vs directory. In fact, if there is a way share some of this between different blob stores -- would be awesome .... so any changes to internal methods don't require changes all over the place.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For now, I've deduped the code inside S3ArtifactRepository by defining a static _verify_listed_object_contains_artifact_path_prefix() method. It may be reasonable to create an abstract class for blob stores that follow this pattern, but it seems we agree that this may be a larger undertaking that requires a followup PR.

s3_client = self._get_s3_client()
paginator = s3_client.get_paginator("list_objects_v2")
results = paginator.paginate(Bucket=bucket, Prefix=prefix, Delimiter='/')
for result in results:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's write some sort of a checker to make sure that nesting is legit. For instance it is possible to have this sort of directory structure in S3

dir_name                # is a true directory
dir_name/sub_dir        # this one is a file
dir_name/sub_dir/file   # this is also a file

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

After offline discussion, it appears that we don't currently have a mechanism for checking / enforcing that S3 object keys containing slashes are directories. We should definitely agree on a strategy for dealing with files whose keys contain slashes, but this is a bit outside the scope of the current PR.

@mparkhe
Copy link
Collaborator

mparkhe commented Feb 27, 2019

You'll have to resolve conflict. But the changes LGTM. Thanks for deduping the code and migrating exception raised.

@dbczumar dbczumar merged commit 6d07ede into mlflow:master Feb 28, 2019
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants