S3 Integration with DIGITS #1868

taltmans · 2017-10-31T00:45:49Z

Added functionality for integration of DIGITS with S3 Endpoints as well as accompanying unit tests.

gheinrich

This looks very good thanks! I have restarted the Torch CI job as it could have been a transient failure. Can you squash your commits and see if you can fix the Lint errors on https://travis-ci.org/NVIDIA/DIGITS/jobs/295123107

Thanks!

gheinrich · 2017-10-31T03:42:41Z

digits/dataset/images/classification/views.py

        )

+def from_s3(job, form):
+    print('from_s3 in progress...')


can you remove this print statement?

gheinrich · 2017-10-31T03:44:14Z

digits/dataset/tasks/create_db.py

            args.append('--compression=%s' % self.compression)
        if self.backend == 'hdf5':
            args.append('--hdf5_dset_limit=%d' % 2**31)
+	if self.delete_files is not None and self.delete_files is True:


why not just if self.delete_files

taltmans · 2017-11-02T21:42:23Z

Hello, we made the changes you requested and fixed the lint warnings in the S3 integration files. Did you determine whether the CI job failure was transient?

gheinrich · 2017-11-03T00:38:41Z

Thank you @taltmans the CI failure was indeed transient. Good job on fixing the Lint errors, you got a pass on your last commit! I'll review the changes again and get back to you, thanks!

TimZaman · 2017-11-03T00:43:21Z

Don't forget to squash all commits. Also, it would be great if the S3 readme can be extended a bit more.

gheinrich · 2017-11-03T22:49:34Z

docs/S3Installation.md

@@ -0,0 +1,20 @@
+# S3 Integration - Installing Boto


why not add boto to requirements.txt file? Sounds easier than doing the installation from source, right?

Good point, we made that modification and removed this S3Installation.md file entirely.

gheinrich · 2017-11-03T22:54:11Z

digits/tools/s3_walker.py

+
+        print('host: ' + self.host)
+        print('is secure: ' + str(self.is_secure))
+        print('port: ' + str(self.port))


These prints could be replaced with calls to a logger as in https://github.com/NVIDIA/DIGITS/blob/master/digits/tools/parse_folder.py#L462.

We removed these and made a couple other modifications to suppress printing to stdout

gheinrich · 2017-11-03T23:03:04Z

digits/dataset/images/classification/forms.py

+    #
+
+    s3_endpoint = utils.forms.StringField(
+        u'Training Images',


It might be clearer if you name this "S3 endpoint URL".

gheinrich · 2017-11-03T23:09:04Z

digits/tools/s3_walker.py

+            if len(keys) >= max_size:
+                break
+
+        print('retrieved ' + str(len(keys)) + ' keys from ' + keys[0] + ' to ' + keys[-1])


this raised an exception in my case IndexError: list index out of range. I am unfamiliar with Boto/S3 so not sure what this means. The folder I pointed to was not empty.

The S3 Walker class filters the keys out based on prefix. Most likely, the folder you had pointed to did not have any keys with the relevant prefix so the keys list was empty at that point, leading to the exception. We added logic to avoid this line if the keys list is empty and added some instructions to examples/s3/README.md to explain how to set up the S3 Endpoint to prevent this from occurring altogether!

gheinrich · 2017-11-03T23:10:05Z

digits/tools/s3_walker.py

+    print('making list bucket with prefix...')
+    keys = walker.listbucket(bucket, path, with_prefix=True)
+
+    print('making list bucket without prefix...')


can you explain why you're doing this with and without prefix?

We reviewed this code and removed it as it was no longer in use. From a S3 perspective, the "with_prefix" argument determines whether the "listbucket" method will return the names of the keys with the prefix or not (i.e. mnist/train/9/59942.png versus 59942.png).

gheinrich · 2017-11-03T23:11:29Z

docs/S3Installation.md

+## Introduction ##
+Boto is a Python library that is required in order for DIGITS to interact with S3. This is not required if DIGITS is being trained on local files but is required for retrieving files from any S3 endpoint.
+
+## Installation ##


Could you add an example walk-through that shows how to populate an S3 bucket with the expected contents and then how to load the data in DIGITS? It would be useful for laymen like me who are not so familiar with all of this :-)

We added some instructions to examples/s3/README.md and an accompanying link in the base README.md. Please let us know if any of it could use further detail.

taltmans · 2017-11-09T01:15:17Z

I squashed the commits and replied to each of your comments above. Please let us know if you have any more feedback.

gheinrich

Hello thanks for the updates I uploaded MNIST to AWS S3 and tried to create a dataset but got this error:

2017-11-13 15:29:22 [20171113-152920-ad91] [WARNING] Parse Folder (train/val) unrecognized output: File "/usr/lib/python2.7/logging/__init__.py", line 861, in emit
2017-11-13 15:29:22 [20171113-152920-ad91] [WARNING] Parse Folder (train/val) unrecognized output: msg = self.format(record)
2017-11-13 15:29:22 [20171113-152920-ad91] [WARNING] Parse Folder (train/val) unrecognized output: File "/usr/lib/python2.7/logging/__init__.py", line 734, in format
2017-11-13 15:29:22 [20171113-152920-ad91] [WARNING] Parse Folder (train/val) unrecognized output: return fmt.format(record)
2017-11-13 15:29:22 [20171113-152920-ad91] [WARNING] Parse Folder (train/val) unrecognized output: File "/usr/lib/python2.7/logging/__init__.py", line 469, in format
2017-11-13 15:29:22 [20171113-152920-ad91] [WARNING] Parse Folder (train/val) unrecognized output: s = self._fmt % record.__dict__
2017-11-13 15:29:22 [20171113-152920-ad91] [WARNING] Parse Folder (train/val) unrecognized output: KeyError: 'job_id'
2017-11-13 15:29:22 [20171113-152920-ad91] [WARNING] Parse Folder (train/val) unrecognized output: Logged from file s3_walker.py, line 30

Any idea? Thanks!

gheinrich · 2017-11-13T09:00:31Z

examples/s3/README.md

+Once that file has been configured appropriately, it may be run using:
+
+```sh
+python upload_mnist.py ~/mnist


did you mean upload_s3_data.py here?

taltmans · 2017-11-13T19:28:54Z

Good catch on the README item. The logging warnings you sent turned out to be an import issue, which we fixed. Despite the warnings, the dataset creation job should have completed properly, did it complete on your end?

gheinrich · 2017-11-14T16:31:27Z

Yes thanks I was able to create the dataset and train a model. I think this is good to go. Can you squash your commits?

taltmans · 2017-11-14T18:25:14Z

I just squashed the commits, we're good to go on our end.

gheinrich · 2017-11-14T20:11:07Z

Thanks for a great feature!

gheinrich suggested changes Oct 31, 2017

View reviewed changes

TimZaman added the enhancement label Nov 3, 2017

gheinrich reviewed Nov 3, 2017

View reviewed changes

taltmans force-pushed the s3integration branch 6 times, most recently from 87ab35a to af3b46a Compare November 9, 2017 00:51

gheinrich reviewed Nov 13, 2017

View reviewed changes

gheinrich approved these changes Nov 14, 2017

View reviewed changes

Added functionality to integrate DIGITS with S3 Endpoints

3392961

taltmans force-pushed the s3integration branch from 927a526 to 3392961 Compare November 14, 2017 18:07

gheinrich merged commit ab2048d into NVIDIA:master Nov 14, 2017

S3 Integration with DIGITS #1868

S3 Integration with DIGITS #1868

Uh oh!

Conversation

taltmans commented Oct 31, 2017

Uh oh!

gheinrich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taltmans commented Nov 2, 2017

Uh oh!

gheinrich commented Nov 3, 2017

Uh oh!

TimZaman commented Nov 3, 2017

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taltmans commented Nov 9, 2017

Uh oh!

gheinrich left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

taltmans commented Nov 13, 2017

Uh oh!

gheinrich commented Nov 14, 2017

Uh oh!

taltmans commented Nov 14, 2017

Uh oh!

gheinrich commented Nov 14, 2017

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants