Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
301 views2 pages

DVS SPARK Course Content PDF

The DVS Spark course content covers 9 modules that introduce Apache Spark and related technologies. Module 1 provides an overview of Spark and its ecosystem. Module 2 covers Scala basics. Module 3 discusses downloading and installing Spark. Module 4 focuses on programming with RDDs. Module 5 covers advanced Spark programming techniques. Module 6 is about loading and saving data. Module 7 introduces Spark SQL. Module 8 covers Spark Streaming. Finally, Module 9 teaches machine learning with MLlib. The course also includes overview sessions on Cassandra and Kafka, culminating in a final project using Spark SQL, Streaming, Kafka and Cassandra.

Uploaded by

JayaramReddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
301 views2 pages

DVS SPARK Course Content PDF

The DVS Spark course content covers 9 modules that introduce Apache Spark and related technologies. Module 1 provides an overview of Spark and its ecosystem. Module 2 covers Scala basics. Module 3 discusses downloading and installing Spark. Module 4 focuses on programming with RDDs. Module 5 covers advanced Spark programming techniques. Module 6 is about loading and saving data. Module 7 introduces Spark SQL. Module 8 covers Spark Streaming. Finally, Module 9 teaches machine learning with MLlib. The course also includes overview sessions on Cassandra and Kafka, culminating in a final project using Spark SQL, Streaming, Kafka and Cassandra.

Uploaded by

JayaramReddy
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 2

DVS SPARK Course Content

MODULE 1 - INTRODUCTION AND EVOLUTION OF APACHE SPARK


 What is Apache Spark & Why Spark?
 Spark History
 Unification in Spark
 Spark ecosystem Vs Hadoop
 Spark with Hadoop

MODULE 2 – SCALA BASICS (Object Oriented and Functional Programming)


 Introduction to Functional Programming
 Interactive Shell – REPL, Data types, Variables, Expressions, Conditional statements,
Loops – For comprehension
 Pattern Matching in Scala with Match expression
 Simple Functions and their variants, Tail Recursion, Functions as Objects aka
Anonymous functions, Higher Order Functions
 Scala Collections and the usage of higher order methods on Collections
 Classes and Objects, Class Constructors in Scala, Case classes, Abstract and Generic
Class
 Exception Handling in Scala
 Traits in Scala, Properties of Traits
 Magic Apply method
 Singleton and Companion objects
 Implicits in Scala – Implicit parameters, def, classes

MODULE 3 - DOWNLOADING SPARK AND GETTING STARTED


 Installing Spark
 Introduction to Spark’s Python and Scala Shells
 Spark Standalone Cluster Architecture and its application flow
 Spark on YARN and its application flow

MODULE 4 - PROGRAMMING WITH RDDS


 RDD Basics and its characteristics, Creating RDDs
 RDD Operations
 Transformations
 Actions
 RDD Types
 Lazy Evaluation
 Persistence (Caching)

MODULES 5 - ADVANCED SPARK PROGRAMMING


 Accumulators and Fault Tolerance

Address: DVS Technologies, Opp Home Town, Beside Biryani Zone, Marathahalli, Bangalore
Mobile: 8892499499, 9632558585 Website: www.dvstechnologies.in
 Broadcast Variables
 Custom Partitioning

MODULE 6 - LOADING AND SAVING YOUR DATA


 Dealing with different file formats (Text, CSV, JSON files etc.)
 Hadoop Input and Output Formats
 Connecting to diverse Data Sources (HDFS, Hive, S3, RDBMS and NoSQL etc.)

MODULE 7 - SPARK SQL


 Linking with Spark SQL
 Initializing Spark SQL
 Data Frames & Caching
 Case Classes, Inferred Schema
 Loading and Saving Data
 Apache Hive
 Data Sources/Parquet
 JSON
 JDBC/ODBC Server
 Spark SQL User Defined Functions (UDFs)
 Hive UDFs

MODULE 8 - SPARK STREAMING
 Batch vs Streaming
 Architecture and Abstraction
 DStreams, DStreams vs RDD
 Transformations
 Input Streams (Socket, HDFS, Twitter, Kafka)
 Check pointing, Persist and Caching
 Batch and Window Sizes
 Level of Parallelism

MODULE 9 - MACHINE LEARNING WITH MLLIB
 Machine Learning Basics and terminology
 Apache Spark MLLib Algorithms
 Examples implementing Machine Learning algorithms using Spark MLLib – Linear
Regression

Overview sessions on Cassandra, Kafka

Project with Spark SQL and Spark Streaming using Kafka & Cassandra

Address: DVS Technologies, Opp Home Town, Beside Biryani Zone, Marathahalli, Bangalore
Mobile: 8892499499, 9632558585 Website: www.dvstechnologies.in

You might also like