Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.
SCALA
Getting started With Scala
01
Scala Background, Scala Vs Java and Basics
Interactive Scala – REPL, data types, variables,
expressions, simple functions
Running the program with Scala Compiler
Explore the type lattice and use type
inference
Define Methods and Pattern Matching
Scala Environment Set up
Scala set up on Windows
and UNIX
JAVA Setup
SCALA Editor
02 Interpreter
Compiler
Functional
Programming
What is Functional
Programming?
Differences between OOPS and 03
FPP
Collections
Iterating, mapping, filtering,
and counting
Regular expressions and
matching with them
Maps, Sets, group By, Options,
flatten, flat Map
Word count, IO operations, file
04 access, flatMap
Object-Oriented
Programming
Classes and Properties
Objects, Packaging, and
Imports
Traits
Objects, classes, inheritance,
Lists with multiple related
types, apply 05
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.
Deep Dive into Scala -1
Benefits of Scala
Language Offerings
Type inferencing
Variables
06
Functions
LOOPS
Control Structures
Vals
Arrays
Lists
Deep Dive into Scala -2
Tuples Maps
Sets Traits and Mixins
Classes and Objects
First class functions
Closures
Inheritance
Sub classes
Case Classes
07 Modules
Pattern Matching
Exception Handling
FILE Operations
Integrations
What is SBT?
Integration of Scala in Eclipse
IDE
Integration of SBT with Eclipse 08
GIT
Introduction to GIT &
Installation
Comparisons, Branching &
Merging
Rebasing, Stashing & Taggings
09
Spark and Hadoop
What is Hadoop platform
Why Hadoop platform
What is Spark
Why spark
Evolution of Spark
Hadoop Vs Spark (Spark
Benefits )
Architecture of Spark
Define Spark Components
Lazy Evaluation
10
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.
SPARK
Environment
Configuring Apache Spark
spark-shell
11
spark submit
Setting Up memory (Driver Memory , Executor
Memory)
Setting Up Cores (Executors Core)
Running Spark in Local
SPARK UI Explanation
Yarn and Cluster Framework
Overview of YARN and
cluster framework.
Setting up Yarn and cluster.
Benefits of Running spark
12 Jobs On cluster Mode Instead
of Local.
Fine Tuning Of memory
While running spark job on
cluster Mode
Programming Magic
with RDD
Hadoop Map Reduce VS Spark
RDD
Benefits Of RDD Over Hadoop
Map Reduce
RDD overview
Transformations and actions in
the context of RDDs.
Demonstrate Each Api's of RDD
With Real Time
Example(Like:cache,uncancahe,
13
count,filter,map etc)
Check Point in RDD.
Minimize data transfers
Concepts of Broadcast Variable
Concepts of Accumulators
Magic With Data Repartition Concepts
frames
Overview Of data frames
14
Read a CSV/ Excel Files And create a
data frame.
Cache/ Uncahe Operations On data
frames.
Persist/UnPersist Operations On data
frames.
Partition and repartition Concepts of
data frames.
For each Partitions On Data frames.
Programming using data frame .
How to use data frames Api 's
effectually.
A magic spark Job using data frame
concept.(small project)
Schema Defining on from data frame
How to perform SQL operations On
data frame.
Check Point in data frame .
StructType and arrayType in data
frames
Complex Data Structure on data
15
frame
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.
Various data
sources
CSV files
16
Excel Files
JSON Files
Parquet file
Benefits of Parquet file
Text Files
Various levels of
persistence
MEMORY_ONLY
MEMORY_ONLY_SER
MEMORY_AND_DISK
MEMORY_AND_DISK_SER,
17 DISK_ONLY
OFF_HEAP
User Define
Functions
Benefits of UDF's over SQL
Writing the UDF's and applying
on to the data frame
Complex UDF's
Data cleaning Using UDF's
18
Connecting Spark
With S3
Connect spark with s3
Read a file from s3 and perform
Transformation
Write a File to the s3
Preparation and close while
19 writing the file to the s3
Cassandra database
Overview of Cassandra
database and benefits.
Partition Key and collection
concepts in Cassandra
Connecting Cassandra with
spark
Read a table from Cassandra
and perform transformations.
Writing data to a Cassandra
table with millions of data
20
Scala & Spark
Scala : "Red Hot" Programming Language for Apache Saprk .
According to a recent survey by Databricks 71% of Spark users also
use Scala programming language.
Redis
Overview of redis
21
How to connect spark with redis
Collection concepts of redis
Reading the key, HashKey and set from redis
and doing operation in spark
Writing various keys to the redis using spark
Spark SQL
Overview of Spark SQL.
How to write SQL in spark.
Various types of Clause in
spark SQL
22 Using UDF’s inside spark SQL
SQL Fine Tuning using spark
Data cleaning
What are the data column
types?
How many fields match the
data type?
How many fields are
mismatches?
Which fields are matches?
Which fields are mismatches?
23
Spark Mlib
Introduction to machine
learning and benefits
Spark Mlib library Introduction.
Vectors, Decision Tree and
matrix concepts
Classification and Regression
Correlations and Stratified
Sampling concepts
Various algorithms Explanation
24 Like K-means, Gaussian
mixtures (GMMs)
Case Studies
Spark Streaming
and Live
Overview of spark streaming
Concepts of Input DStreams
and Receivers and Receiver
Project On
spark
Transformations on DStreams
Window Operations
25