AZURE DATA ENGINEERING TRAINING PROGRAM
Highlights of the program:
• Course is designed to cover from basics till the maximum level depth in all aspects.
• All the pre-requisites required for the course are included in the course.
• The most comprehensive curriculum that you can ever see.
• Instructor / Faculty has around 10+ years of industry experience and is currently working at an
MNC as a Sr. Data Engineer.
• The course takes around 3 to 4 months (Total of 115 hrs.)
• This is a very serious and intensive program. Please join the program only if you can commit
yourself for 3 hrs. daily to utilize the program fully. (1.5 hrs. for daily session and 1.5 hrs. for
practice)
• The training curriculum has been specially curated by experts from IT consulting and Data
Engineering fields to make sure the BEST is offered and make the candidate fully ready for job
market.
• Assignments, real-time scenarios and real-time projects given as part of the training.
• Class timings will be Sun to Thu 9.30 pm EST to 11 pm EST.
WE ONLY CHOOSE TRAINERS IF THEY ARE
EXCELLENT, STUDENT-FRIENDLY AND HAVE AN
ATTITUDE TO HELP STUDENTS IN THE BEST WAY
POSSIBLE. OUR TRAINING QUALITY IS SUPERB!!
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
AZURE DATA ENGINEERING TRAINING CURRICULUM
Table of Contents
1. SQL (14 Hours) .................................................................................................................................. 3
2. Python Basics (10 Hours) .................................................................................................................. 3
3. DELTA LAKE (6 Hours) ....................................................................................................................... 3
1) Delta Lake usage in Databricks. ................................................................................................. 3
4. Azure Data Engineer (30 Hours) ....................................................................................................... 4
2) Overview of the Microsoft Azure Platform ................................................................................ 4
3) Azure Data Architecture ............................................................................................................ 4
4) Azure Storage options ............................................................................................................... 4
5) Blob Storage .............................................................................................................................. 4
6) Azure Data Factory..................................................................................................................... 4
7) Azure SQL Database Service....................................................................................................... 5
8) Azure Data Lake Gen1 & Gen2 ................................................................................................... 5
9) Azure Synapse SQL DW (Dedicated SQLPOOL) ......................................................................... 5
5. Azure Databricks Concepts (20 Hours) ............................................................................................. 6
1) Azure Databricks Introduction ................................................................................................... 6
2) Azure Databricks concepts ........................................................................................................ 6
3) Data Management ..................................................................................................................... 6
4) Computation Management ....................................................................................................... 6
5) Databricks Advanced topics. ..................................................................................................... 6
6. SPARK Concepts (10 Hours ) ............................................................................................................. 7
1) Introduction to Spark - Getting started ..................................................................................... 7
2) Resilient Distributed Dataset and DataFrames .......................................................................... 7
3) Spark application programming................................................................................................. 7
4) Introduction to Spark libraries ................................................................................................... 7
5) Spark configuration, monitoring and tuning.............................................................................. 7
7. PySpark (10 hours) ............................................................................................................................... 8
1) Introduction To Pyspark............................................................................................................. 8
2) Introduction To RDDs ................................................................................................................ 8
3) Introduction to DataFrame. ....................................................................................................... 8
4) Different types of Big Data File systems. ................................................................................... 8
5) Reading and Writing Different Types of Files using Dataframe. ............................................... 8
6) Need for Spark SQL .................................................................................................................... 9
7) User-Defined Functions ............................................................................................................. 9
8) Performance Tuning .................................................................................................................. 9
9) Spark-Hive Integration ............................................................................................................... 9
10) Pyspark Project with execution. ............................................................................................ 9
8. SPARK SQL (9 Hours) ............................................................................................................................ 9
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
9. Power BI Basics (6 Hours) .................................................................................................................. 10
1. SQL (14 Hours)
A. Introduction
B. Select Clause
C. Sub Query
D. Group by Having
E. Case Statement
F. Set Operation
G. CTE
H. Update/ Delete/ Create Statements
2. Python Basics (10 Hours)
I. Introduction
J. Data Types
K. Control Statements
L. Loop Statements
M. List/ Tuple/ Dictionary/Sets
N. Import Modules
O.
3. DELTA LAKE (6 Hours)
1) Delta Lake usage in Databricks.
P. Delta Lake Architecture
Q. Delta Lake Storage Understanding
R. Delta lake table creation and API options
S. Delta Lake DML Operations usage.
T. Delta Lake partitions
U. Delta Lake Schema Enforcement
V. Delta Lake Schema Evolution
W. Delta Lake Versions
X. Delta Lake Time Travel
Y. Delta Lake Vaccum
Z. Delta Lake Merge (SCD Type 1 and SCD Type2)
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
4. Azure Data Engineer (30 Hours)
2) Overview of the Microsoft Azure Platform
A. Introduction to Azure
B. Basics of Cloud computing
C. Azure Infrastructure
D. Walkthrough of Azure Portal
E. Overview of Azure Services
3) Azure Data Architecture
A. Traditional RDBMS workloads.
B. Data Warehousing Approach
C. Big data architectures.
D. Transferring data to and from Azure
4) Azure Storage options
A. Blob Storage
B. ADLS Gen1 & Gen2
C. RDBMS
D. Hadoop
E. NoSQL
F. Disk
5) Blob Storage
A. Azure Blob Resources
B. Types of Blobs in Azure
C. Azure storage account data objects
D. Azure storage account types and Options
E. Replications in distribution
F. Secure access to an application's data
G. Azure Import/Export service
H. Storage Explorer
I. Practical section on Blob Storage
6) Azure Data Factory
A. Azure Data Factory Architecture
B. Creating ADF Resource and Use in azure cloud
C. Pipeline Creation and Usage Options
D. Copy Data Tool in ADF Portal, Use
E. Linked Service Creation in ADF
F. Dataset Creation, Connection Reuse
G. Staging Dataset with Azure Storage
H. ADF Pipeline Deployments
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
I. Pipeline Orchestrationusing Triggers
J. ADF Transformations and other tools integration.
K. Processing different type’s files using ADF.
L. Integration Runtime
M. Monitoring ADF Jobs
N. Manage IR’s and Linked Services.
7) Azure SQL Database Service
A. Introduction to Azure SQL Database
B. Relational Data Services in the Cloud
C. Azure SQL Database Service Tiers
D. Database Throughput Units (DTU)
E. Scalable performance and pools
F. Creating and Managing SQL Databases
G. Azure SQL Database Tools
H. Migrating data to Azure SQL Database
8) Azure Data Lake Gen1 & Gen2
A. Explore the Azure Data Lake enterprise-class security features.
B. Understand storage account keys.
C. Understand shared access signatures.
D. Understand transport-level encryption with HTTPS.
E. Understand Advanced Threat Protection.
F. Control network access.
G. Differences between Gen1 & Gen2
9) Azure Synapse SQL DW (Dedicated SQLPOOL)
A. Azure Synapse DW (Dedicated SQL POOL)?
B. Synapse DW Architecture.
C. Creating Internal table with default distribution
D. Creating external table in synapse dw
E. Loading data from databricks to azure synase dw
F. Loading data from adls gen2 to azure synapse dw
G. What is dedicated sql pool
H. data warehouse unit overview
I. Distributed table with example
J. Hash distribution with example
K. Round robin distribution with example
L. Replicate distribution with example
M. What are the types of indexes withexamples
N. Clustered Index with example
O. Non-Clustered index withexample
P. Clustered Column Store Index withexample
Q. Heap Index with example
10) Authentication & Access
A. Overview
B. Azure Key Vault
C. System Assigned Managed Activities
D. User Assigned Managed Activites
E. Service Principles
F. Fine Grained Access Control for Storage Accounts
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
5. Azure Databricks Concepts (20 Hours)
1) Azure Databricks Introduction
A. Databricks Architecture
B. Databricks Components overview
C. Benefits for data engineers and data scientists
2) Azure Databricks concepts
A. Workspace – Creation and managing workspace.
B. Notebook – creating notebooks, calling and managing different notebooks.
C. Library - installing libraries, managing libraries
3) Data Management
A. Databricks File System. - DBFS commands copy and manage files using DBFS.
B. Database - Creating database, tables and managing databases and tables.
C. Table - Creating Tables, dropping tables, loading data ..
D. Metastore - managing metadata and delta tables creation, managing delta tables.
4) Computation Management
A. Cluster -- Creating Clusters, managing clusters
B. Pool - creating pools and using pools for Auto scaling.
C. Databricks RunTime - understanding and using Databricks runtimes based on requirement.
D. Jobs - creating jobs from notebooks and assigning types of clusters for jobs.
E. Workload - monitoring jobs and managing loads.
F. Execution Context – understanding context.
5) Databricks Advanced topics.
A. Databricks Workflows
B. Calling one notebook into another notebook.
C. Creating global variables (widgets) and using into Azure ADF pipeline.
D. How to implement parallelism in notebooks execution.
E. Mounting azure blob storage and data lake storage accounts.
F. Integrating source code (notebooks) with GitHub
G. Calling DataBricks notebooks into Azure Data factory.
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
H. Databricks Clusters logs monitoring flow.
6. SPARK Concepts (10 Hours )
1) Introduction to Spark - Getting started
A. What is Spark and what is its purpose?
B. Components of the Spark unified stack
C. Resilient Distributed Dataset (RDD)
D. Downloading and installing Spark standalone
E. Scala and Python overview
F. Launching and using Spark’s Scala and Python shell ©
2) Resilient Distributed Dataset and DataFrames
A. Understand how to create parallelized collections and external datasets
B. Work with Resilient Distributed Dataset (RDD) operations
C. Utilize shared variables and key-value pairs
3) Spark application programming
A. Understand the purpose and usage of the Spark Context
B. Initialize Spark with the various programming languages
C. Describe and run some Spark examples
D. Pass functions to Spark
E. Createand run a Spark standalone application
F. Submit applications to the cluster
4) Introduction to Spark libraries
A. Understand and use the various Spark libraries
5) Spark configuration, monitoring and tuning
A. Understand components of the Spark cluster
B. Configure Spark to modify the Spark properties, environmental variables, or logging
properties
C. Monitor Spark using the web UIs, metrics, and external instrumentation ,Understand
performance tuning considerations
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
7. PySpark (10 hours)
1) Introduction To Pyspark
1) What is SparkSession
2) How to create spark session
3) What is SparkContext
4) How to create SparkContext
5) What is SQLContext
➢ How to Use Jupyter Notebooks & Databricks notebooks for Python Development.
➢ Install and configure PySpark in Local System for development.
➢ Introduction to Big Data and Apache Spark
➢ Apache Spark Framework & Execution Process.
2) Introduction To RDDs
1) Different Ways to Create RDD’s in Pyspark.
2) RDD Transformations
3) RDD Actions
4) RDD Cache & Persist
3) Introduction to DataFrame.
1) Different Ways to Create Data Frame’sin Pyspark.
2) Dataframe Transformations
3) Dataframe Actions
4) Dataframe Cache & Persist
4) Different types of Big Data File systems.
1) Difference between Row store format and column store format.
2) Avro File
3) Parquet file
4) ORC File
5) Reading and Writing Different Types of Files using Dataframe.
1) Csv files
2) Json files
3) Xml files
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
4) Excel files
5) Complex Json files
6) Avro files
7) Parquet files
8) Orc files
6) Need for Spark SQL
➢ What is Spark SQL
1) SQL Table Creation
2) SQL Join Types
3) SQL Nested Queries
4) SQL DML Operations
5) SQL Merge Scripts
6) SQL SCD Type 2 implementation
7) User-Defined Functions
8) Performance Tuning
9) Spark-Hive Integration
10) Pyspark Project with execution.
1) End to End Pyspark Projects implementation
2) Executing Pyspark Project in Databricks
3) Executing PySpark project in Azure ADF.
8. SPARK SQL (9 Hours)
7) Introduction to Spark SQL.
8) Spark SQL Create database
9) Drop databases
10) Create internal table
11) Create external table
12) Create partitioned table
13) Create partitioned with bucketing table
14) SPARK DML insert, update, delete and merge operations
15) SPARK SQL DRL Select queries with different clauses
16) Spark SQL MERGE With SCD Type 1 and SCD Type 2
17) Spark SQL WHERE Clause, Group By Clause and Having Clauses
18) Spark SQL Order by, Sort By clauses
19) Spark SQL join types, Window, Pivot , Limit and Like
20) Spark SQL Grouping Sets, Rollup and Cube
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166
21) Spark SQL Cultured By and Distributed By
22) Spark SQL Case, With and Take sample
9. Power BI Basics (6 Hours)
Intuites Pvt Ltd || SM Plaza, 4th Fl, A S Rao Nagar, Hyderabad || +91 93988 78906 || +1 678-720-3166