BIG DATA ENGINEER
MASTER’S PROGRAM
In collaboration with IBM
www.simplilearn.com
1 | www.simplilearn.com
Contents
About the Course 03
Key Features of Big Data Engineer Master’s 04
Program
About IBM and Simplilearn collaboration 05
Learning Path Visualization 06
Program Outcomes 07
Who Should Enroll 08
Courses 09
Step 1: Big Data for Data Engineering 09
Step 2: Data Engineering with Hadoop 10
Step 3: Data Engineering with Scala 11
Step 4: Big Data Hadoop and Spark Developer 12
Step 5: Python for Data Science 14
Step 6: PySpark Training 15
Step 7: Apache Kafka 17
Step 8: MongoDB Developer and Administrator 19
Step 9: AWS Technical Essentials 21
Step 10: Big Data on AWS 22
Step 11: Big Data Capstone 23
Electives 24
Certificates 26
Advisory Board Members 27
About the Course
This Big Data Engineer Master’s framework, leverage the functionality
Program, in collaboration with IBM, of Apache Spark with Python,
provides training on the competitive simplify data lines with Apache Kafka,
skills required for a rewarding career and use the open source database
in data engineering. You’ll learn management tool MongoDB to store
to master the Hadoop big data data in big data environments.
3 | www.simplilearn.com
Key
Features
Industry-recognized certificates from IBM
and Simplilearn
Real-life projects providing hands-on
industry training
30+ in-demand skills
Lifetime access to self-paced learning and
class recordings
$1,200 worth of IBM cloud credits
4 | www.simplilearn.com
About IBM and Simplilearn
collaboration
This joint partnership between a plethora of technology and
Simplilearn and IBM introduces consulting services. Each year, IBM
students to our integrated, Blended invests $6 billion in research and
Learning approach, making them development and has achieved five
an expert in data engineering. The Nobel Prizes, nine U.S. National
program, in collaboration with IBM, Medals of Technology, five U.S.
will make students industry-ready National Medals of Science, six
for data engineer job roles. IBM is Turing Awards, and 10 inductions into
a leading cognitive solutions and the U.S. Inventors Hall of Fame.
cloud platform company, offering
About Simplilearn
Simplilearn is a leader in digital skills by the industry’s highest completion
training, focused on the emerging rates. Partnering with professionals and
technologies that are transforming our companies, we identify their unique needs
world. Our Blended Learning approach and provide outcome-centric solutions to
drives learner engagement and is backed help them achieve their professional goals.
5 | www.simplilearn.com
Learning Path - Big Data Engineer
Big Data for
Data Engineering
Data Engineering Data Engineering
with Hadoop with Scala
Python for Big Data Hadoop
Data Science and Spark Developer
PySpark
Apache Kafka
Training
AWS Technical MongoDB Developer
Essentials and Administrator
Big Data Big Data
on AWS Capstone
Electives
• Scala for Data Science
• Spark for Scala Analytics
• Industry Master Class -
Data Engineering
• Python for Data Science
• AWS Technical Essentials
• Core Java Certification Training
6 | www.simplilearn.com
Big Data Engineer Master’s Program
Outcomes
Gain an in-depth understanding of the Achieve insights on how to improve
flexible and versatile frameworks on business productivity by processing
the Hadoop ecosystem, such as Pig, big data on platforms that can
Hive, Impala, HBase, Sqoop, Flume, handle its volume, velocity, variety,
and Yarn and veracity
Master tools and skills such as data Learn how Kafka is used in the real
model creation, database interfaces, world, including its architecture
advanced architecture, Spark, Sala, and components, get hands-on
RDD, SparkSQL, Spark Streaming, experience connecting Kafka to
Spark ML, GraphX, Sqoop, Flume, Pig, Spark, and work with Kafka Connect
Hive, Impala, and Kafka architecture
Understand how to use Amazon EMR
Understand how to model data, for processing data using Hadoop
perform ingestion, replicate data, and ecosystem tools
shard data using the NoSQL database
management system MongoDB
Become proficient with the
fundamentals of the Scala language,
Gain expertise in creating and its tooling, and the development
maintaining analytics infrastructure process
and own the development,
deployment, maintenance, and
monitoring of architecture
components
7 | www.simplilearn.com
Who Should Enroll in this Program?
A big data engineer builds and IT professionals
maintains data structures and
architectures for data ingestion, Banking and finance
processing, and deployment professionals
for large-scale, data-intensive Database administrators
applications. It’s a promising
career for both new and Beginners in the data
experienced professionals with engineering domain
a passion for data, including:
Students in UG/ PG
programs
8 | www.simplilearn.com
S
T
E
Big Data for Data Engineering P
1
This introductory course from IBM will teach you the basic concepts and 2
terminologies of big data and its real-life applications across multiple
industries. You will gain insights on how to improve business productivity 3
by processing large volumes of data and extracting valuable information
from them.
4
5
Key Learning Objectives 6
Understand what big data is, sources of big data, and real-life 7
examples
8
Learn the key difference between big data and data science
Master the usage of big data for operational analysis and better
9
customer service 10
Gain knowledge of the ecosystem of big data and the Hadoop
framework
11
Course curriculum
Lesson 1 - What is Big Data?
Lesson 2 - Big Data: Beyond the Hype
Lesson 3 - Big Data and Data Science
Lesson 4 - Use Cases
Lesson 5 - Processing Big Data
9 | www.simplilearn.com
S
T
E
Data Engineering with Hadoop P
1
Apache Hadoop is one of the most in-demand technologies for analyzing 2
big data. This introductory Hadoop course by IBM will give you an
overview of what Hadoop is and its components, such as MapReduce 3
and Hadoop Distributed File System (HDFS). Additionally, this course will
teach you to explore with large data sets and use Hadoop’s method of
4
distributed processing. 5
6
Key Learning Objectives
7
Understand Hadoop’s architecture and primary components, such as
MapReduce and (HDFS) 8
Add and remove nodes from Hadoop clusters, check the available disk 9
space on each node, and modify configuration parameters
10
Learn about Apache projects that are part of the Hadoop ecosystem,
including Pig, Hive, HBase, ZooKeeper, Oozie, Sqoop, and Flume. 11
Course curriculum
Lesson 1 - Introduction to Hadoop
Lesson 2 -Hadoop Architecture
Lesson 3 -Hadoop Administration
Lesson 4 -Hadoop Components
10 | www.simplilearn.com
S
T
E
Data Engineering with Scala P
1
Kickstart your learning of Scala with this introductory course and 2
familiarize yourself with Scala programming. Upon completion of this
course, carefully crafted by IBM, you will be able to write Scala codes, 3
perform big data analysis using Scala, and create your own Scala projects.
4
5
Key Learning Objectives
6
Create your own Scala Project
Understand basic object-oriented programming methodologies in
7
Scala 8
Work with data in Scala, including pattern matching, applying
synthetic methods, and handling options, failures, and futures
9
10
Course curriculum 11
Lesson 1 - Introduction
Lesson 2 - Basic Object Oriented Programming
Lesson 3 - Case Objects and Classes
Lesson 4 - Collections
Lesson 5 - Idiomatic Scala
11 | www.simplilearn.com
S
T
E
Big Data Hadoop and Spark Developer P
1
Simplilearn’s Big Data Hadoop Training Course helps you master big 2
data and Hadoop ecosystem tools such as HDFS, YARN, MapReduce,
Hive, Impala, Pig, HBase, Spark, Flume, Sqoop, and Hadoop Frameworks, 3
including additional concepts of the big data processing life cycle.
Throughout this online, instructor-led Hadoop training, you will work
4
on real-time projects in retail, tourism, finance, and other domains. 5
This big data course also prepares you for Cloudera’s CCA175 Big Data
certification. 6
7
Key Learning Objectives 8
Learn how to navigate the Hadoop Ecosystem and understand how to
optimize its use
9
Ingest data using Sqoop, Flume, and Kafka
10
Implement partitioning, bucketing, and indexing in Hive 11
Work with RDD in Apache Spark
Process real-time streaming data
Perform DataFrame operations in Spark using SQL queries
Implement User-Defined Functions (UDF) and User-Defined Attribute
Functions (UDAF) in Spark
12 | www.simplilearn.com
Course curriculum
Lesson 1 - Introduction to Bigdata and Hadoop
Lesson 2 - Hadoop Architecture Distributed Storage (HDFS)
and YARN
Lesson 3 - Data Ingestion into Big Data Systems and ETL
Lesson 4 - Distributed Processing MapReduce Framework and Pig
Lesson 5 - Apache Hive
Lesson 6 - NoSQL Databases HBase
Lesson 7 - Basics of Functional Programming and Scala
Lesson 8 - Apache Spark Next-Generation Big Data Framework
Lesson 9 - Spark Core Processing RDD
Lesson 10 - Spark SQL Processing DataFrames
Lesson 11 - Spark MLLib Modelling BigData with Spark
Lesson 12 - Stream Processing Frameworks and Spark Streaming
Lesson 13 - Spark GraphX
13 | www.simplilearn.com
S
T
E
Python for Data Science P
1
Kickstart your learning of Python for data science with this introductory 2
course and familiarize yourself with programming. Upon completion of
this course, carefully crafted by IBM, you will be able to write your Python 3
scripts, perform fundamental hands-on data analysis using the Jupyter-
based lab environment, and create your own data science projects using
4
IBM Watson. 5
6
Key Learning Objectives
7
Write your first Python program by implementing concepts of
variables, strings, functions, loops, and conditions 8
Understand the nuances of lists, sets, dictionaries, conditions and 9
branching, objects, and classes
10
Work with data in Python, including reading and writing files, loading,
working, and saving data with Pandas 11
Course curriculum
Lesson 1 - Python Basics
Lesson 2 - Python Data Structures
Lesson 3 - Python Programming Fundamentals
Lesson 4 - Working with Data in Python
Lesson 5 - Working with NumPy Arrays
14 | www.simplilearn.com
S
T
E
Pyspark Training P
1
Pyspark Training will provide an in-depth overview of Apache Spark, the 2
open-source query engine for processing large datasets, and how to
integrate it with Python using the PySpark interface. This course will show 3
you how to build and implement data-intensive applications as you dive
into the world of high-performance machine learning. You’ll learn how to
4
leverage Spark RDD, Spark SQL, Spark MLlib, Spark Streaming, HDFS, 5
Sqoop, Flume, Spark GraphX, and Kafka.
6
Key Learning Objectives 7
Understand how to leverage the functionality of Python as you deploy 8
it in the Spark ecosystem
9
Master Apache Spark architecture and how to set up a Python
environment for Spark 10
Learn about various techniques for collecting data, understand RDDs 11
and how to contrast them with DataFrames, how to read data from
files and HDFS, and how to work with schemas
Obtain a comprehensive knowledge of various tools that fall under
the Spark ecosystem such as Spark SQL, Spark MlLib, Sqoop, Kafka,
Flume, and Spark Streaming
Create and explore various APIs to work with Spark DataFrames
and learn how to aggregate, transform, filter, and sort data with
DataFrames.
15 | www.simplilearn.com
Course curriculum
Lesson 01 - A Brief Primer on Pyspark
Lesson 02 - Resilient Distributed Datasets
Lesson 03 - Resilient Distributed Datasets and Actions
Lesson 04 - DataFrames and Transformations
Lesson 05 - Data Processing with Spark DataFrames
16 | www.simplilearn.com
S
T
E
Apache Kafka P
1
In this Apache Kafka certification course, you will master the architecture, 2
installation, configuration, and interfaces of Kafka open-source
messaging. With this Kafka training, you will learn the basics of Apache 3
ZooKeeper as a centralized service and develop the skills to deploy Kafka
for real-time messaging. This course is part of the Big Data Hadoop
4
Architect Master’s Program and is recommended for developers and 5
analytics professionals who wish to advance their expertise.
6
Key Learning Objectives 7
Describe the importance of big data
8
Describe the fundamental concepts of Kafka 9
Describe the architecture of Kafka 10
Explain how to install and configure Kafka 11
Explain how to use Kafka for real-time messaging
17 | www.simplilearn.com
Course Curriculum
Lesson 1 - Getting Started with Big Data and Apache Kafka
Lesson 2 - Kafka Producer
Lesson 3 - Kafka Consumer
Lesson 4 - Kafka Operations and Performance Tuning
Lesson 5 - Kafka Cluster Architecture and Administering Kafka
Lesson 6 - Kafka Monitoring and Schema Registry
Lesson 7 - Kafka Streams and Kafka Connectors
Lesson 8 - Integration of Kafka with Storm
Lesson 9 - Kafka Integration with Spark and Flume
Lesson 10 - Admin Client and Securing Kafka
18 | www.simplilearn.com
S
T
E
MongoDB Developer and Administrator P
1
Become an expert MongoDB developer and administrator by gaining an 2
in-depth knowledge of NoSQL and master the skills of data modeling,
ingestion, query, sharding, and data replication. This course includes 3
industry-based projects in the elearning and telecom domains. It is
best suited for database administrators, software developers, system 4
administrators, and analytics professionals.
5
6
Key Learning Objectives
7
Develop expertise in writing Java and NodeJS applications using
MongoDB 8
Master the skills of replication and sharding of data in MongoDB to
optimize read/write performance
9
Perform installation, configuration, and maintenance of the MongoDB
10
environment
11
Get hands-on experience creating and managing different types of
indexes in MongoDB for query execution
Proficiently store unstructured data in MongoDB
Develop skill sets for processing huge amounts of data using
MongoDB tools
Gain proficiency in MongoDB configuration, backup methods, and
monitoring and operational strategies
Acquire an in-depth understanding of how to manage DB Notes,
Replica set, and master-slave concepts.
19 | www.simplilearn.com
Course curriculum
Lesson 1 - Introduction to NoSQL Databases
Lesson 2 - MongoDB: A Database for the Modern Web
Lesson 3 - CRUD Operations in MongoDB
Lesson 4 - Indexing and Aggregation
Lesson 5 - Replication and Sharding
Lesson 6 - Developing Java and Node JS Application with MongoDB
Lesson 7 - Administration of MongoDB Cluster Operations
20 | www.simplilearn.com
S
T
E
AWS Technical Essentials P
1
This AWS Technical Essentials course teaches you how to navigate the 2
AWS management console; understand AWS security measures, storage,
and database options; and gain expertise in web services like RDS and 3
EBS. This course, prepared in line with the latest AWS syllabus, will help
you become proficient in identifying and efficiently using AWS services. 4
5
Key Learning Objectives 6
Understand the fundamental concepts of AWS platform and cloud
computing
7
Identify AWS concepts, terminologies, benefits, and deployment
8
options to meet business requirements
9
Identify deployment and network options in AWS
10
11
Course curriculum
Lesson 01 - Introduction to Cloud Computing
Lesson 02 - Introduction to AWS
Lesson 03 - Storage and Content Delivery
Lesson 04 - Compute Services and Networking
Lesson 05 - AWS Managed Services and Databases
Lesson 06 - Deployment and Management
21 | www.simplilearn.com
S
T
E
Big Data on AWS P
1
In this AWS Big Data certification course, you will become familiar with 2
concepts cloud computing and its deployment models; the Amazon
web services cloud platform; Kinesis Analytics; AWS big data storage, 3
processing, analysis, visualization, and security services; EMR; AWS
Lambda and Glue; machine learning algorithms; and much more. 4
5
Key Learning Objectives 6
Understand how to use Amazon EMR for processing the data using
Hadoop ecosystem tools
7
Understand how to use Amazon Kinesis for big data processing in
8
real-time
9
Analyze and transform big data using Kinesis Streams
10
Visualize data and perform queries using Amazon QuickSight
11
Course curriculum
Lesson 1 - AWS in Big Data Introduction
Lesson 2 - Collection
Lesson 3 - Storage
Lesson 4 - Processing I
Lesson 5 - Processing II
Lesson 6 - Analysis I
Lesson 7 - Analysis II
Lesson 8 - Visualisation
Lesson 9 - Security
22 | www.simplilearn.com
S
T
E
Big Data Capstone P
1
This Big Data Capstone project will give you an opportunity to implement
the skills you learned throughout this program. Through dedicated
2
mentoring sessions, you’ll learn how to solve a real-world, industry-aligned 3
big data problem. This project is the final step in the learning path and will
enable you to showcase your expertise in big data to future employers. 4
5
6
7
8
9
10
11
23 | www.simplilearn.com
Elective Course
Scala for Data Science
This course will let you flex your Scala skills for data
preparation, feature engineering, creating data
pipelines, and solving big data analytics problems. You
will learn how to leverage the integration of Apache
Spark and Scala and how to use Spark’s machine
learning pipelines to fit models and search for optimal
hyperparameters using Scala in a Spark cluster.
Spark for Scala Analytics
Through this course you will get an overview of the
history of Apache Spark, how it evolved, how to build
applications with Spark, RDDs and Data frames, the
Spark ecosystem, and its associated ecosystems. You
will learn how to leverage the core RDD and DataFrame
APIs to perform analytics on datasets with Scala.
Industry Master Class – Data Engineering
Attend an online interactive Masterclass and get in-
sights into the world of data engineering.
Python for Data Science
Kickstart your learning of Python for Data Science with
this introductory course and familiarize yourself with
programming. Carefully crafted by IBM, upon comple-
tion of this course you will be able to write your Python
scripts, perform fundamental hands-on data analysis
using the Jupyterbased lab environment, and create
your own Data Science projects using IBM Watson.
24 | www.simplilearn.com
AWS Technical Essentials
This AWS Technical Essentials course teaches you how
to navigate the AWS management console; understand
AWS security measures, storage, and database options;
and gain expertise in web services like RDS and EBS. This
course, prepared in line with the latest AWS syllabus, will
help you become proficient in identifying and efficiently
using AWS services.
Core Java Certification Training
Gain expertise in basic concepts of Core Java such as
scope of variables, operators, arrays, loops, methods,
constructors, etc. and acquire a complete understanding
of JDBC architecture and JUnit framework. You’ll be able
to write Java code using operators, constructors, loops,
functions, conditions, work with methods and encapsula-
tion and Implement multi-threading, string handling, and
exception handling techniques.
25 | www.simplilearn.com
Certificates
Upon completion of this Master’s Program, you will receive certificates from
IBM and Simplilearn in the Big Data Engineer courses in the learning path.
These certificates will testify to your skills as an expert in data engineering.
Upon program completion, you will also receive an industry recognized Master’s
Certificate from Simplilearn.
26 | www.simplilearn.com
Advisory board member
Ronald Van Loon
Top 10 Big Data & Data Science Influencer,
Director of Adversitement
Named by Onalytica as one of the three most
influential people in big data, Ronald is also an
author for a number of leading big data and
data science websites, including Datafloq, Data
Science Central, and The Guardian. He also
regularly speaks at renowned events.
27 | www.simplilearn.com
USA
Simplilearn Americas, Inc.
201 Spear Street, Suite 1100, San Francisco, CA 94105
United States
Phone No: +1-844-532-7688
INDIA
Simplilearn Solutions Pvt Ltd.
# 53/1 C, Manoj Arcade, 24th Main, Harlkunte
2nd Sector, HSR Layout
Bangalore - 560102
Call us at: 1800-212-7688
www.simplilearn.com