You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
This tutorial demonstrates how to write and run [Apache Spark](http://spark.apache.org) applications using Scala with some SQL. I also teach a little Scala as we go, but if you already know Spark and you are more interested in learning just enough Scala for Spark programming, see my other tutorial [Just Enough Scala for Spark](https://github.com/deanwampler/JustEnoughScalaForSpark).
13
13
14
14
This tutorial demonstrates how to write and run [Apache Spark](http://spark.apache.org) applications using Scala with some SQL. You can run the examples and exercises several ways:
15
15
16
-
1.[Jupyter notebooks](http://jupyter.org/) - The easiest way, especially for data scientists accustomed to _notebooks_
17
-
2. In an IDE, like [IntelliJ](https://www.jetbrains.com/idea/) - Familiar for developers
18
-
3. At the terminal prompt using the build tool [SBT](https://www.scala-sbt.org/)
16
+
1.Notebooks, like [Jupyter](http://jupyter.org/) - The easiest way, especially for data scientists accustomed to _notebooks_.
17
+
2. In an IDE, like [IntelliJ](https://www.jetbrains.com/idea/) - Familiar for developers.
18
+
3. At the terminal prompt using the build tool [SBT](https://www.scala-sbt.org/).
19
19
20
20
This tutorial is mostly about learning Spark, but I teach you a little Scala as we go. If you are more interested in learning just enough Scala for Spark programming, see my new tutorial [Just Enough Scala for Spark](https://github.com/deanwampler/spark-scala-tutorial).
21
21
@@ -46,12 +46,12 @@ Begin by cloning or downloading the tutorial GitHub project [github.com/deanwamp
46
46
47
47
Now Pick the way you want to work through the tutorial:
48
48
49
-
1.Jupyter notebooks - Go [here](#use-jupyter-notebooks)
49
+
1.Notebooks - Go [here](#use-notebooks)
50
50
2. In an IDE, like IntelliJ - Go [here](#use-ide)
51
51
3. At the terminal prompt using SBT - Go [here](#use-sbt)
52
52
53
-
<aname="use-jupyter-notebooks"></a>
54
-
## Using Jupyter Notebooks
53
+
<aname="use-notebooks"></a>
54
+
## Using Notebooks
55
55
56
56
The easiest way to work with this tutorial is to use a [Docker](https://docker.com) image that combines the popular [Jupyter](http://jupyter.org/) notebook environment with all the tools you need to run Spark, including the Scala language. It's called the [all-spark-notebook](https://hub.docker.com/r/jupyter/all-spark-notebook/). It bundles [Apache Toree](https://toree.apache.org/) to provide Spark and Scala access.
57
57
The [webpage](https://hub.docker.com/r/jupyter/all-spark-notebook/) for this Docker image discusses useful information like using Python as well as Scala, user authentication topics, running your Spark jobs on clusters, rather than local mode, etc.
@@ -60,13 +60,12 @@ There are other notebook options you might investigate for your needs:
60
60
61
61
**Open source:**
62
62
63
-
*[Jupyter](https://ipython.org/) + [BeakerX](http://beakerx.com/) - a powerful set of extensions for Jupyter
63
+
*[Polynote](https://polynote.org/) - A cross-language notebook environment with built-in Scala support. Developed by Netflix.
64
+
*[Jupyter](https://ipython.org/) + [BeakerX](http://beakerx.com/) - a powerful set of extensions for Jupyter.
64
65
*[Zeppelin](http://zeppelin-project.org/) - a popular tool in big data environments
65
-
*[Spark Notebook](http://spark-notebook.io) - a powerful tool, but not as polished or well maintained
66
66
67
67
**Commercial:**
68
68
69
-
*[IBM Data Science Experience](http://datascience.ibm.com/) - IBM's full-featured environment for data science
70
69
*[Databricks](https://databricks.com/) - a feature-rich, commercial, cloud-based service
0 commit comments