Thanks to visit codestin.com
Credit goes to github.com

Skip to content

Commit aa70ef2

Browse files
Remove dataproc samples (GoogleCloudPlatform#4479)
1 parent 69b7160 commit aa70ef2

16 files changed

+2
-1402
lines changed

dataproc/README.md

Lines changed: 2 additions & 83 deletions
Original file line numberDiff line numberDiff line change
@@ -1,84 +1,3 @@
1-
# Cloud Dataproc API Examples
1+
These samples have been moved.
22

3-
[![Open in Cloud Shell][shell_img]][shell_link]
4-
5-
[shell_img]: http://gstatic.com/cloudssh/images/open-btn.png
6-
[shell_link]: https://console.cloud.google.com/cloudshell/open?git_repo=https://github.com/GoogleCloudPlatform/python-docs-samples&page=editor&open_in_editor=dataproc/README.md
7-
8-
Sample command-line programs for interacting with the Cloud Dataproc API.
9-
10-
See [the tutorial on the using the Dataproc API with the Python client
11-
library](https://cloud.google.com/dataproc/docs/tutorials/python-library-example)
12-
for information on a walkthrough you can run to try out the Cloud Dataproc API sample code.
13-
14-
Note that while this sample demonstrates interacting with Dataproc via the API, the functionality demonstrated here could also be accomplished using the Cloud Console or the gcloud CLI.
15-
16-
`list_clusters.py` is a simple command-line program to demonstrate connecting to the Cloud Dataproc API and listing the clusters in a region.
17-
18-
`submit_job_to_cluster.py` demonstrates how to create a cluster, submit the
19-
`pyspark_sort.py` job, download the output from Google Cloud Storage, and output the result.
20-
21-
`single_job_workflow.py` uses the Cloud Dataproc InstantiateInlineWorkflowTemplate API to create an ephemeral cluster, run a job, then delete the cluster with one API request.
22-
23-
`pyspark_sort.py_gcs` is the same as `pyspark_sort.py` but demonstrates
24-
reading from a GCS bucket.
25-
26-
## Prerequisites to run locally:
27-
28-
* [pip](https://pypi.python.org/pypi/pip)
29-
30-
Go to the [Google Cloud Console](https://console.cloud.google.com).
31-
32-
Under API Manager, search for the Google Cloud Dataproc API and enable it.
33-
34-
## Set Up Your Local Dev Environment
35-
36-
To install, run the following commands. If you want to use [virtualenv](https://virtualenv.readthedocs.org/en/latest/)
37-
(recommended), run the commands within a virtualenv.
38-
39-
* pip install -r requirements.txt
40-
41-
## Authentication
42-
43-
Please see the [Google cloud authentication guide](https://cloud.google.com/docs/authentication/).
44-
The recommended approach to running these samples is a Service Account with a JSON key.
45-
46-
## Environment Variables
47-
48-
Set the following environment variables:
49-
50-
GOOGLE_CLOUD_PROJECT=your-project-id
51-
REGION=us-central1 # or your region
52-
CLUSTER_NAME=waprin-spark7
53-
ZONE=us-central1-b
54-
55-
## Running the samples
56-
57-
To run list_clusters.py:
58-
59-
python list_clusters.py $GOOGLE_CLOUD_PROJECT --region=$REGION
60-
61-
`submit_job_to_cluster.py` can create the Dataproc cluster or use an existing cluster. To create a cluster before running the code, you can use the [Cloud Console](console.cloud.google.com) or run:
62-
63-
gcloud dataproc clusters create your-cluster-name
64-
65-
To run submit_job_to_cluster.py, first create a GCS bucket (used by Cloud Dataproc to stage files) from the Cloud Console or with gsutil:
66-
67-
gsutil mb gs://<your-staging-bucket-name>
68-
69-
Next, set the following environment variables:
70-
71-
BUCKET=your-staging-bucket
72-
CLUSTER=your-cluster-name
73-
74-
Then, if you want to use an existing cluster, run:
75-
76-
python submit_job_to_cluster.py --project_id=$GOOGLE_CLOUD_PROJECT --zone=us-central1-b --cluster_name=$CLUSTER --gcs_bucket=$BUCKET
77-
78-
Alternatively, to create a new cluster, which will be deleted at the end of the job, run:
79-
80-
python submit_job_to_cluster.py --project_id=$GOOGLE_CLOUD_PROJECT --zone=us-central1-b --cluster_name=$CLUSTER --gcs_bucket=$BUCKET --create_new_cluster
81-
82-
The script will setup a cluster, upload the PySpark file, submit the job, print the result, then, if it created the cluster, delete the cluster.
83-
84-
Optionally, you can add the `--pyspark_file` argument to change from the default `pyspark_sort.py` included in this script to a new script.
3+
https://github.com/googleapis/python-dataproc/tree/master/samples

dataproc/create_cluster.py

Lines changed: 0 additions & 77 deletions
This file was deleted.

dataproc/create_cluster_test.py

Lines changed: 0 additions & 47 deletions
This file was deleted.

dataproc/dataproc_e2e_donttest.py

Lines changed: 0 additions & 32 deletions
This file was deleted.

dataproc/instantiate_inline_workflow_template.py

Lines changed: 0 additions & 107 deletions
This file was deleted.

dataproc/instantiate_inline_workflow_template_test.py

Lines changed: 0 additions & 31 deletions
This file was deleted.

0 commit comments

Comments
 (0)