Thanks to visit codestin.com
Credit goes to github.com

Skip to content
This repository was archived by the owner on Sep 3, 2022. It is now read-only.

Conversation

chmeyers
Copy link
Contributor

@chmeyers chmeyers commented May 15, 2017

Based off of the nvidia Ubuntu 16.04 container. Also switching the non-GPU container to an Ubuntu 16.04 base image for consistency. There were a few required changes to the Dockerfile to make it build with Ubuntu.

The switch to Ubuntu adds ~80MB, which increases startup time by a few seconds, but has the advantage that Tensorflow will no longer Segfault, and it's a much newer OS in general.

There is additional work before the GPU images can be used seemlessly, as the Container OS VM image we currently use doesn't natively support GPU containers, but it's possible to manually run these with nvidia-docker on a VM where GPU drivers are installed.

Based off of the nvidia Ubuntu 16.04 container.  Also switching
the non-GPU container to an Ubuntu 16.04 base image for consistency.
@chmeyers chmeyers requested review from ojarjur and craigcitro May 15, 2017 23:37
Copy link
Contributor

@craigcitro craigcitro left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM, just a few little questions


trap 'rm -rf pydatalab' exit

BASE_IMAGE_SUBSTITUTION="s/_base_image_/nvidia\/cuda:8.0-cudnn5-devel-ubuntu16.04/"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

small thing: you can use characters other than / to avoid backslash escaping, eg

s,_base_image_,nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04,

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

mkdir -p /srcs && \
cd /srcs && \
apt-get source -d wget git python-zmq ca-certificates pkg-config libpng-dev && \
apt-get source --allow-unauthenticated -d wget git python-zmq ca-certificates pkg-config libpng-dev && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe add a note about why we need --allow-unauthenticated? (Is it a temporary thing?)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done. It's because Ubuntu can't find the keys to the source git repos. Since we only download these for licensing reasons and don't actually use them, I think it's fine. The apt-get installs above are still authenticated.

MAINTAINER Google Cloud DataLab

# Download and Install GPU specific packages
RUN pip install -U --upgrade-strategy only-if-needed --no-cache-dir tensorflow-gpu==1.0.1 && \
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just to confirm: we're sure that installing tensorflow-gpu over an existing tensorflow install will correctly replace things as needed?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes.

@craigcitro
Copy link
Contributor

LGTM

@yelsayd
Copy link
Contributor

yelsayd commented May 30, 2017

Do we also want to change the rollback script?

Adding it at the end to cover the case where it doesn't exist.
@@ -0,0 +1,22 @@
# Copyright 2015 Google Inc. All rights reserved.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is that copyright year correct?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Fixed

# limitations under the License.

FROM debian:jessie
FROM _base_image_
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Instead of a template, let's make this a local Docker tag (e.g. datalab-base-image), and then whatever the base is will be based on that tag rather than having to munge template files.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done for the base image. We could do this for the top-level image as well, but that already has a template for the version numbers, so leaving it as is for now.

gcloud docker -- push gcr.io/${PROJECT_ID}/datalab:local

echo "Pulling the rollback GPU images: ${DATALAB_GPU_IMAGE}"
gcloud docker -- pull ${DATALAB_GPU_IMAGE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Since this is new, we should gracefully handle the situation where the image we are trying to rollback to does not exist.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The only graceful thing to do here is to exit. As this is the last step in the rollback and the other images have already rolled back successfully, failing here will leave things in the desired state.


trap 'rm -rf pydatalab' exit

docker pull nvidia/cuda:8.0-cudnn5-devel-ubuntu16.04
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suspect that this may work just as well as a second step in the main build.sh file, but I don't feel strongly enough about it to block this change.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I mainly kept these separate for development purposes. The GPU build takes significantly longer to complete, and if you are developing locally, you really only want one of them.

# This will fail and exit if the previous GPU image doesn't exist.
# This will happen if we try to rollback the first GPU release, and
# that is fine since there is nothing to rollback to.
gcloud docker -- pull ${DATALAB_GPU_IMAGE}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I worry about the following scenario:

  1. We do a new release with the first GPU image.
  2. We have to roll that release back
  3. The rollback gets to this step and fails
  4. The Jenkins job performing the rollback shows up as a failure
  5. The release engineer retries the rollback, causing us to rollback one more release than intended.

How about just adding a || exit 0 to the end of this line?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

_DATALAB_NETWORK = 'datalab-network'
_DATALAB_NETWORK_DESCRIPTION = 'Network for Google Cloud Datalab instances'

_DATALAB_FIREWALL_RULE = 'datalab-network-allow-ssh'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This, and the following line, are unused and can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


_NVIDIA_PACKAGE = 'cuda-repo-ubuntu1604_8.0.61-1_amd64.deb'
_DATALAB_NETWORK = 'datalab-network'
_DATALAB_NETWORK_DESCRIPTION = 'Network for Google Cloud Datalab instances'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is unused and can be deleted.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

'--no-connect' flag.""")

_NVIDIA_PACKAGE = 'cuda-repo-ubuntu1604_8.0.61-1_amd64.deb'
_DATALAB_NETWORK = 'datalab-network'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is used, but I'd rather delete it and switch the one use to create.DATALAB_NETWORK (i.e. expose the other constant to this package).

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done

_DATALAB_FIREWALL_RULE = 'datalab-network-allow-ssh'
_DATALAB_FIREWALL_RULE_DESCRIPTION = 'Allow SSH access to Datalab instances'

_DATALAB_DEFAULT_DISK_SIZE_GB = 200
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This and the following line can be deleted.

_DATALAB_DISK_DESCRIPTION = (
'Persistent disk for a Google Cloud Datalab instance')

_DATALAB_NOTEBOOKS_REPOSITORY = 'datalab-notebooks'
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd also replace this with create.DATALAB_NOTEBOOKS_REPOSITORY

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done


_DATALAB_NOTEBOOKS_REPOSITORY = 'datalab-notebooks'

_DATALAB_STARTUP_SCRIPT = """#!/bin/bash
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There's a lot of code duplicated from the create.py file here.

Can we move that off to something like a base_startup_script constant, and then have the two packages just define suffixes for the base startup script?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Done.

ojarjur
ojarjur previously approved these changes Jun 1, 2017
@chmeyers chmeyers merged commit 4e58926 into master Jun 1, 2017
@chmeyers chmeyers deleted the chmeyers-gpu branch June 1, 2017 20:27
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants