0% found this document useful (0 votes)

14 views22 pages

PROJECT 11 For Python

Python mini project

Uploaded by

nikhilranjan2357

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

14 views22 pages

PROJECT 11 For Python

Python mini project

Uploaded by

nikhilranjan2357

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 22

Azure End-to-End Data Engineering Project

What is Data Engineering?

Data engineering refers to designing and building systems that collect, process, store, and
analyze data at different scales. This data usually comes from different sources at different
times, hence the need to ensure proper cleaning and transformation before making it available
to data scientists and analysts. The common task for data engineers involves creating data
pipelines to allow streaming data for different use cases. The standard data pipeline phases
include:

1. Ingestion: This phase involves gathering data from various sources.

2. Processing: This stage involves processing the data to desired data sets for different use
cases.

3. Storing: The processed data is then stored for easy retrieval from the database.

4. Access: The stored data is accessible to users for different use cases.
Azure End-to-End Data Engineering

I will use Microsoft Azure to demonstrate a real-world application of data engineering. I will also
use an open-source dataset from the Kaggle competition “Data Science for Good: Kiva
Crowdfunding This data is in CSV format; hence, you need to load the data into the database for
storage. I assumed I would be receiving data daily to add to the different tables in the database.
I used the SQL server database as the data source for the project.

Kiva is an online crowdfunding organization that offers loan services to underserved and
unbanked populations worldwide. Some of the use cases of these loans include starting a
business, paying school fees, and investing in farming. The datasets available at Kiva show the
loans issued to borrowers between January 2014 and July 2017. This information will be
valuable in defining some Key Performance Indicators (KPIs) to evaluate the organization’s
performance over time.

The Data Architecture

I used data stored on our On-prem SQL server database. The project aimed to create a data
pipeline that migrated the data into the cloud from the SQL server. Therefore, I would trigger this
pipeline to run daily when new data is added to the database.
1. On-prem SQL Server Database: I used data stored in our local SQL database in the project.

2. Azure Data Factory: ADF is the ETL tool used for data ingestion. I connected the ADF to the
SQL server to copy the required tables to the cloud.

3. Azure Data Lake Gen 2: This is where all the data are stored.

4. Azure Databricks: Azure Databricks transformed the data into the required format.

5. Azure Synapse Analytics: This tool will load the Azure SQL database data.

6. Power BI: To load data from Azure Synapse Analytics and Create reports for business use
cases.

7. Azure Active Directory and Azure Key Vault: To enhance security access to the different
data in the cloud.

Environment Set-up

One requires a Microsoft Azure account for this project. Microsoft offers beginners a free 30-
day trial period to access the Azure portal. Inside the portal, you need to create a resource
group with a unique name. Furthermore, you need to create different resources in the resource
group to enable the execution of the different data pipeline runs.
Data Source

In this project, I created a username and password to allow access to the Azure database.
Besides, I stored the username and password in the Azure Key vault for secure login to access
the data.

I also granted the user access to the database to read all the tables with the prefix ‘dbo’ since I
will read the tables with the ‘dbo’ schema to Azure. To check the user permissions, expand the
security panel on the left side under the particular database, expand the user, and refresh to
show the new user created. Right-click on the user name, under the properties, grant the
‘db_datareader’ for the database role membership.
Data Ingestion Using Azure Data Factory

Creating a Connection Between SQL Server and Azure Data Factory

Since there was no relationship between Azure Data Factory and the SQL server database, I
needed to install a Self-Hosted Integration Runtime to create the link. Under the manage tab in
the ADF, select integration runtime. Under this option, there will be an
AutoResolveIntegrationRuntime already running. This AutoResolveIntegrationRuntime is
usually used to connect to cloud-based resources. However, I needed to create an integration
runtime to connect to the on-prem SQL database.

To create the SHIR, select the new button and select Azure, Self-Hosted setup, give a
descriptive name, and create the runtime. For this project, I used the express setup option
since I was installing it on my machine only. However, one can use the manual setup option
when installing the SHIR on multiple machines for the same project.

On successful installation, run the newly installed Microsoft Integration Runtime Configuration
Manager to confirm if the connection is established. A successful installation should show the
connection as:
Data Ingestion

For this project, I was interested in creating a pipeline that would read all the tables in the SQL
server and load them in Azure Data Lake Gen 2. I created three storage containers, (gold, silver,
and bronze) where the bronze container will store the raw data as it was from the source. The
silver container would contain data from level 1 transformation from the bronze container while
the gold container would contain the cleaned and transformed data from the silver container.
To create a pipeline to read all the tables in the database, I used the lookup activity to look up
all the tables from the SQL database. In this lookup activity, I use a query to list the tables to
ingest in Azure Data Lake Gen 2.

This query would list the tables available in the database with the given schema name.
Thereafter, I created a ForEach item in the pipeline to iterate from the Lookup activity to copy all
the tables. The ForEach item would loop through all the tables and copy the items listed from
the lookup table.

Under the ForEach item, I created the Copy data item to copy data from the source. I configured
the source of the data and the sink to save the data as CSV files. I also specified the path of the
datasets so that each copied table would be stored separately in a folder and a path with the
file name. Furthermore, the data ingestion pipeline was used to read and store the data from
the SQL server to the bronze container, where the data was in the raw format.
After the successful configuration of the pipeline, I ran using the debug option to test whether it
was working without any errors. Besides, I used the add trigger option to run the pipeline and
tested with the current time. Since the pipeline was meant to fetch the tables from the on-prem
SQL server every time it runs, every successful run would overwrite the existing folder in the
bronze container. At the end of the data ingestion, the data was now stored in the bronze
container ready for transformation.
Data Transformation Using Azure Databricks

Storage Mount

After loading data from the SQL server to the bronze container, I performed two levels of
transformations of the datasets into cleaner versions. However, I needed to create a compute
cluster to allow different jobs to run in the notebooks. To create the compute cluster, launch
the Azure Databricks studio, click on the compute tab, and configure the compute instance
according to your needs. I selected a single user since I was the only one using the cluster in the
resource and selected the other default options. It is also important to note that I enabled
‘credential passthrough for user-level data access’ to allow Azure Databricks to connect to
Azure Data Lake Gen 2.

In the Databricks workspace, create a notebook called ‘storagemount’ to mount the storage
account in Azure Databricks. This option will allow accessing the files in the storage account.
To mount the storage using credential passthrough, there are codes from this webpage
(https://learn.microsoft.com/en-us/azure/databricks/archive/credential-passthrough/adls-
passthrough), and made some changes according to my configuration.

configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class":
spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}

# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://[email protected]/",
mount_point = "/mnt/bronze",
extra_configs = configs)

For this code, update the container and storage account names to match what is on the
resource group. Also, change the mount point name with that of the container.

Similarly, I mounted both the silver and gold containers. This option allowed accessing data
from these different containers. Furthermore, I could access the data by giving the full path for
the containers since I had allowed credential passthrough.

configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class":
spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}

# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://[email protected]/",
mount_point = "/mnt/silver",
extra_configs = configs)

configs = {
"fs.azure.account.auth.type": "CustomAccessToken",
"fs.azure.account.custom.token.provider.class":
spark.conf.get("spark.databricks.passthrough.adls.gen2.tokenProviderClassName")
}

# Optionally, you can add <directory-name> to the source URI of your mount point.
dbutils.fs.mount(
source = "abfss://[email protected]/",
mount_point = "/mnt/gold",
extra_configs = configs)
Level 1 Data Transformation

Since the data looked relatively clean in the database, I focused the first level of transformation
on changing the date format from date-time to date type. At this level, I wanted the
transformation to apply to all the tables and the columns of the date data type.

First, I created another notebook named ‘bronze_to_silver’ for this first transformation. I
created an array to list the table names in the bronze container. This code iterates through the
bronze container to get the directory name and append the directory name to the table name
array.

table_name = []
for i in dbutils.fs.ls('mnt/bronze/dbo/'):
table_name.append(i.name.split('/')[0])

The code below generates the input path for all the tables iteratively and then loads as a data
frame. The code declares the variable column, which gets the column names from the data
frame as a list. I ensured that the file type was specified as CSV and the header was set as true
to return the tables with header names. The next loop in the code checks if any column name
has the name ‘date’ or ‘time’ and then converts it to the date format only. The last part of the
code generates the output path in the silver container using the table name and writing the
transformed data in the data lake. I specified the file type as CSV and used the overwrite option
to rewrite any data in the silver container with the new version.

from pyspark.sql.functions import from_utc_timestamp, date_format

from pyspark.sql.types import StructType, StructField, StringType, TimestampType

for i in table_name:
input_path = '/mnt/bronze/dbo/' + i + '/' + i +'.csv'
df = spark.read.format('csv').option('header', 'true').load(input_path)

for col in df.columns:

if "date" in col or "time" in col:
df = df.withColumn(col, date_format(from_utc_timestamp(df[col].cast(TimestampType()),
"UTC"), "yyyy-MM-dd"))

output_path = '/mnt/silver/dbo/' + i + '/'

df.write.format('csv').mode("overwrite").option('header', 'true').save(output_path)
Level 2 Data Transformation

After performing the first level of transformation to the data, the data in the silver container
appeared cleaner but required further transformation to be ready for different use cases. Thus, I
performed another transformation from data in the silver container by changing the column
names to lowercase. This transformation was to ensure there is a uniform naming convention
for the data across the different tables. The transformed data was then stored in different
folders in the gold container with respective table names.

I created another notebook and named it ‘silver_to_gold.’ Similarly, I created an array listing
table names in the silver container. This code iterates through the silver container to get the
directory name and append the directory name to the table name array.

table_name = []
for i in dbutils.fs.ls('mnt/silver/dbo/'):
table_name.append(i.name.split('/')[0])

The code below generates the input path for all the tables iteratively and then loads as a data
frame. The code declares the variable column_names, which gets the column names from the
data frame as a list. I ensured that the file type was specified as CSV and the header was set as
true to return the tables with header names. The next loop in the code iterates through the
column names and changes to lowercase. The last part of the code generates the output path
in the gold container using the table name and writing the transformed data in the data lake. I
specified the file type as CSV and used the overwrite option to rewrite any data in the gold
container with the new version.

for name in table_name:

input_path = '/mnt/silver/dbo/' + name # Fix the input_path variable
print(input_path)
df = spark.read.format('csv').option('header', 'true').load(input_path)

# Get the list of column names

column_names = df.columns

for old_col_name in column_names:

# Change the column names to lowercase
new_col_name = old_col_name.lower()

# Change the column name using withColumnRenamed

df = df.withColumnRenamed(old_col_name, new_col_name)

output_path = '/mnt/gold/dbo/' + name + '/'

df.write.format('csv').mode("overwrite").option('header', 'true').save(output_path)
Data Transformation Pipeline

Since the two notebooks perform the necessary transformation for data, I needed to add them
to the Azure Data Factory pipeline to transform the data when triggered with daily additional
data. To achieve the connection between Azure Databricks and Azure Data Factory, I created a
new linked service and used an access token to authenticate the linked service.

After creating the linked service, I selected the author tab and looked for the pipeline I had
created earlier to add the two notebooks. I added the Azure Databricks notebook item from this
tab to add the new activity for the bronze to the silver notebook. I specified the path of the
notebook and connected the output of the ForEach activity to the bronze-to-silver notebook.
This link ensured the notebook would only run if the ForEach activity were successful. Similarly,
I added another notebook item for the silver-to-gold activity to run after the successful run of
the bronze-to-silver notebook.
I published all the changes and used the debug option to run the pipeline. Similarly, I added a
new trigger to run the pipeline to confirm if it properly worked, where I confirmed each activity
ran successfully from data ingestion to processing.
Data Loading Using Azure Synapse

With the transformed data in the gold container, I needed to load the data in the Azure SQL
database for access and reporting. Therefore, I created a serverless database to store the data
and named the database ‘gold_db.’ The Azure Synapse Analytics workspace is linked to the
Azure Data Lake Gen 2; hence, one can find data from the different containers in the linked
datasets.

Since there were different tables in the gold container, I needed to create views for the Azure
SQL database. Therefore, I created a stored procedure using a new SQL script to create views of
the different tables.

USE gold_db
GO

CREATE OR ALTER PROC CreateSQLServerlessView_gold @Viewname nvarchar(100)

AS
BEGIN

DECLARE @statement VARCHAR(MAX)

SET @statement = N'CREATE OR ALTER VIEW ' + @Viewname + ' AS

SELECT *
FROM
OPENROWSET(
BULK ''https://kivaloansstg.dfs.core.windows.net/gold/dbo/' + @Viewname + '/'',
FORMAT = ''CSV''
) AS [result]
'

EXEC (@statement)

END
GO
I also needed to create a pipeline to use the stored procedure (SQL query) to create views in the
SQL serverless database. Under the manage tab, I made a new linked service for the ‘Azure SQL
Database’ to access the serverless SQL database. Note the integration runtime is automatically
selected as “AutoResolveIntegrationRuntime’ since it is a cloud-based resource. Configure the
account selection manually and use the ‘System Assigned Managed Identity’ authentication
type.

After creating the linked service, I made a new pipeline and added the ‘Get Metadata’ item to
get the table names from the gold container. Similarly, I added a ‘ForEach’ item to get each item
from the metadata and selected the stored procedure name from the existing list.

After that, I published all the changes and ran the pipeline using the debug option. This option
added the different views from the tables in the gold container to the Azure SQL database.
Data Reporting using Power BI

In this last section of the project, I explain how I used Power BI to connect to the Azure Synapse
SQL serverless database, fetched the views in the database, and created an interactive
dashboard to report some of the KPIs for Kiva.

I used the ‘Get data’ option on the Power BI ribbon for this scenario and selected Azure. Under
the Azure data source, I selected ‘Azure Synapse Analytics SQL’ and hit the connect button.
This action prompts one to input the server name and database.

One can get the server’s name by navigating to the Azure Synapse Analytics workspace, under
the properties tab, and copying the ‘Serverless SQL endpoint’ as the server name. I also
maintained the import data connectivity mode to load the data/tables in Power BI.

I used my Microsoft account to sign in and connect to the database.

This action connected to the database and fetched the different views in the Azure SQL
database as tables. I noticed an error in importing the files due to a conflict between the date
format in the tables and my settings in Power BI. Therefore, I disabled the ‘Time Intelligence’
option under the ‘Options’ menu for the current file.

I performed some transformations on the tables using Power Query to ensure each column was
of the right data type. I also converted the date column of the main table to the appropriate data
type.

Since I was to use the dates to filter the data for some insights, I created the date table
separately using Bravo, an external tool for Power BI. Bravo generates the dates automatically
and creates the date table.

Refer to this video for using Bravo: (https://www.youtube.com/watch?v=WybnTHDl-

AM&t=326s).
The resulting data table is shown below. The other tables remained the same.

Before analyzing and building the visuals, I checked the modeling tab to ensure the tables had
proper relations. One must manually create the relationship between the new date table and
the date column of the ‘kiva_loans’table.
After confirming the tables had the appropriate relationships, I created the dashboard below for
reporting.

According to this dashboard, I used three cards to display the important metrics. Using the
unique ID from the ‘kiva_loans’ table, I got the count of the total number of loans disbursed by
Kiva. Besides, the total loan and total funded amounts are $565.4 million and $527.6 million,
respectively. I also included a bar chart to show the percentage of loans borrowed by sector.
This chart shows the sector where the loan borrowers intended to use the loans. Additionally, I
included a doughnut chart to show the gender distribution of the borrowers. To achieve this
chart, applied conditional formatting to the column borrowers’ gender and categorized the
‘male’ as male, ‘female’ as ‘female’, and any other combination as a group. Therefore, the chart
shows females were the most borrowers while people in group borrowers were the least.
I also included a clustered column chart to show the loan repayment habits of the borrowers
based on gender. The chart indicates most of the borrowers’ monthly repayments. However,
some borrowers made irregular repayments making it the second-most repayment routine.
Still, the bullet and weekly repayment intervals were the least preferred repayment intervals by
the borrowers. I also included a table to show the loan distribution by country. The table is in
descending order and shows the top two countries, the Philippines and Kenya. The line chart
shows the total loan amount vs the funded amount from January 2014 to July 2017. There is a
significant trend and difference between these two metrics over the years and different months.

From the data table, I included the ‘Year’ and ‘Month’ slicers to help filter the dashboard
according to specific periods. This approach would allow users to check the different metrics
and compare the trends throughout the various months.

Pipeline Testing

I tested this pipeline by adding 10 random rows to the main table and triggering the pipeline to
run. I also created a scheduled trigger to run once daily as I added ten rows of data in the
original table for five days. After the successful run of each trigger, I refreshed the Power BI
report, which changed to reflect the new data according to the number of loans added. This
approach showed the data engineering pipeline worked as scheduled.

Conclusion

According to this project, I have showcased how I used Microsoft Azure to create an end-to-end
data engineering pipeline. This pipeline used data from the on-premise SQL server database
and loaded the data in Azure Data Lake Gen 2 storage. I performed the necessary
transformations using Azure Databricks and loaded the data in Azure SQL Database. Using two
levels of transformations, I ensured the data was clean and ready for different use cases. Thus,
I extracted the data using Power BI and created a report for presentation to relevant
stakeholders. This report represents a practical use case of Azure in data engineering in
gathering data and making it ready and reliable for use.

Narsimlu - Azure Data Engineer - Resume .Pf-1
67% (3)
Narsimlu - Azure Data Engineer - Resume .Pf-1
4 pages
Azure Data Engineer
100% (1)
Azure Data Engineer
8 pages
Azure Data Factory
100% (2)
Azure Data Factory
10 pages
Types of Activities in ADF
100% (1)
Types of Activities in ADF
37 pages
Start To Finish With Azure Data Factory
100% (2)
Start To Finish With Azure Data Factory
30 pages
Azure Data Factory
No ratings yet
Azure Data Factory
4 pages
Azure Data Engineering Guide
No ratings yet
Azure Data Engineering Guide
11 pages
End To End Project ADF
No ratings yet
End To End Project ADF
73 pages
Azure Datalake
No ratings yet
Azure Datalake
8 pages
Azure Data Factory Workshop
No ratings yet
Azure Data Factory Workshop
26 pages
Azure Data Factory Guide
No ratings yet
Azure Data Factory Guide
2,982 pages
Lab 1 - Getting Started With Azure Data Factory
No ratings yet
Lab 1 - Getting Started With Azure Data Factory
5 pages
Azure DataEngineer Course Outline
No ratings yet
Azure DataEngineer Course Outline
4 pages
Implementing An Azure SQL Data Warehouse
No ratings yet
Implementing An Azure SQL Data Warehouse
41 pages
Azure Interview Questions
No ratings yet
Azure Interview Questions
7 pages
200T01-A: Implementing An Azure Data Solution: Course Outline Module 1: Azure For The Data Engineer
No ratings yet
200T01-A: Implementing An Azure Data Solution: Course Outline Module 1: Azure For The Data Engineer
4 pages
Adf Syllabus
No ratings yet
Adf Syllabus
12 pages
Azure Data Factory Guide & Tutorials
No ratings yet
Azure Data Factory Guide & Tutorials
1,158 pages
ADF Course Deck
No ratings yet
ADF Course Deck
88 pages
Self Introduction
No ratings yet
Self Introduction
3 pages
Azure Data Factory
No ratings yet
Azure Data Factory
3,167 pages
Azure Data Engineering Roadmap
No ratings yet
Azure Data Engineering Roadmap
36 pages
Azure Data Engineering Crash Course
100% (1)
Azure Data Engineering Crash Course
91 pages
Bharathbejjanki - ADF Developer
No ratings yet
Bharathbejjanki - ADF Developer
6 pages
Azure Data Factory
100% (2)
Azure Data Factory
14 pages
Azure Data Factory - Pratap - Qbex Technologies - 8886230001
No ratings yet
Azure Data Factory - Pratap - Qbex Technologies - 8886230001
4 pages
ADF Copy Data
100% (1)
ADF Copy Data
81 pages
DP 203 Data Engineering Course Syllabus
No ratings yet
DP 203 Data Engineering Course Syllabus
4 pages
Azure ADF
No ratings yet
Azure ADF
22 pages
MS Azure+Azure Data Engineering-Syllabus
No ratings yet
MS Azure+Azure Data Engineering-Syllabus
9 pages
Data Factory
100% (2)
Data Factory
26 pages
Accenture Answers
No ratings yet
Accenture Answers
10 pages
ADF Copy Data
No ratings yet
ADF Copy Data
85 pages
Azure Data Engineer Course Curriculum Nareshit
100% (1)
Azure Data Engineer Course Curriculum Nareshit
10 pages
Azure Data Engineering Course Interview Questions 1751484980
No ratings yet
Azure Data Engineering Course Interview Questions 1751484980
20 pages
Azure Data Factory For Beginners
No ratings yet
Azure Data Factory For Beginners
250 pages
Sales Data Analytics AW-2017LT Az - Project - 2
No ratings yet
Sales Data Analytics AW-2017LT Az - Project - 2
118 pages
What Is Azure Data Engineer
No ratings yet
What Is Azure Data Engineer
74 pages
Azure Data Superstore Pipeline - End-to-End Data Engineering and Visualization Report
No ratings yet
Azure Data Superstore Pipeline - End-to-End Data Engineering and Visualization Report
23 pages
PROJECT 10 For Python
No ratings yet
PROJECT 10 For Python
16 pages
Data Engineering With Azure Modern Steps
No ratings yet
Data Engineering With Azure Modern Steps
4 pages
Data Factory, Data Integration
100% (1)
Data Factory, Data Integration
2,034 pages
Load Data With Azure Data Factory
No ratings yet
Load Data With Azure Data Factory
4 pages
f4b7901ed5e5f9106a3a82eea2e2f003
No ratings yet
f4b7901ed5e5f9106a3a82eea2e2f003
3,614 pages
Azure Data Engr POC - S For Interns
No ratings yet
Azure Data Engr POC - S For Interns
9 pages
ForumDE AzureDataEngineer Curriculum
No ratings yet
ForumDE AzureDataEngineer Curriculum
6 pages
PROJECT 4 For Python
No ratings yet
PROJECT 4 For Python
26 pages
Data Analyst Azure PowerBI Syllabus
No ratings yet
Data Analyst Azure PowerBI Syllabus
35 pages
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
No ratings yet
Microsoft Certified: Azure Data Engineer Associate - Skills Measured
4 pages
dp-203 Notes1
No ratings yet
dp-203 Notes1
12 pages
Azure Data Engineering - Pragathi
No ratings yet
Azure Data Engineering - Pragathi
4 pages
Azure Data Factory Interview Questions and Aswers
No ratings yet
Azure Data Factory Interview Questions and Aswers
5 pages
Azure Modern Data Warehouse Solutions
No ratings yet
Azure Modern Data Warehouse Solutions
92 pages
Azure Data Engineering for Pharma
100% (1)
Azure Data Engineering for Pharma
5 pages
Azure Data Factory V2 Preview Guide
No ratings yet
Azure Data Factory V2 Preview Guide
59 pages
Slides 1234
No ratings yet
Slides 1234
34 pages
Azure Data Factory Whitepaper PassingParameters
No ratings yet
Azure Data Factory Whitepaper PassingParameters
19 pages
Dp203 Notes
No ratings yet
Dp203 Notes
87 pages
Azure Data Engineering Project Part 1
No ratings yet
Azure Data Engineering Project Part 1
41 pages
Replication
100% (1)
Replication
27 pages
CSE 127: Computer Security: SQL Injection
No ratings yet
CSE 127: Computer Security: SQL Injection
35 pages
SQL Course (Datacamp) Databases - SQL
No ratings yet
SQL Course (Datacamp) Databases - SQL
98 pages
N Jaya Mani - Data Engineer
No ratings yet
N Jaya Mani - Data Engineer
8 pages
Cognizant Interview Questions by Mrtrueliving Ix
No ratings yet
Cognizant Interview Questions by Mrtrueliving Ix
53 pages
Associate BI Developer Profile
No ratings yet
Associate BI Developer Profile
3 pages
Oracle SQLPLSQL Post
No ratings yet
Oracle SQLPLSQL Post
7 pages
4851151417549777
No ratings yet
4851151417549777
75 pages
Lab Manual
No ratings yet
Lab Manual
8 pages
DBMS LAB - Ex - No-6
No ratings yet
DBMS LAB - Ex - No-6
10 pages
Tech-Savvy Project Manager Resume
No ratings yet
Tech-Savvy Project Manager Resume
1 page
Department of Computer Science Master of Computer Applications
No ratings yet
Department of Computer Science Master of Computer Applications
61 pages
Sap - sql2005 - Best Practices
No ratings yet
Sap - sql2005 - Best Practices
101 pages
SQL Server Database Exercises Guide
No ratings yet
SQL Server Database Exercises Guide
41 pages
MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 5
No ratings yet
MBA-DATA ANALYTICS - Data Science and Business Analysis - Unit 5
44 pages
Alok
No ratings yet
Alok
6 pages
SQL Basics and Commands Guide
No ratings yet
SQL Basics and Commands Guide
13 pages
Erd Dan Phisical DB
No ratings yet
Erd Dan Phisical DB
10 pages
#Lecture 2 FDB
No ratings yet
#Lecture 2 FDB
51 pages
Performance Task 1 - Attempt Review Prog 114
100% (1)
Performance Task 1 - Attempt Review Prog 114
4 pages
1 Introduction To Information Systems............................................. 2
No ratings yet
1 Introduction To Information Systems............................................. 2
16 pages
Query Optimization 1711205804
No ratings yet
Query Optimization 1711205804
9 pages
SQL Exercises - Pieces and Providers - Wikibooks, Open Books For An Open World With Answers PDF
No ratings yet
SQL Exercises - Pieces and Providers - Wikibooks, Open Books For An Open World With Answers PDF
3 pages
Snowflake Information Schema Guide
No ratings yet
Snowflake Information Schema Guide
7 pages
Telecom ETL Guide for Developers
No ratings yet
Telecom ETL Guide for Developers
2 pages
Comprehensive Exam For Database Concepts
No ratings yet
Comprehensive Exam For Database Concepts
4 pages
Lab Final Solution
No ratings yet
Lab Final Solution
18 pages
SQL Lab Manual For Practicals
No ratings yet
SQL Lab Manual For Practicals
150 pages
Class 12 CS Chapter 9 2024-2025
No ratings yet
Class 12 CS Chapter 9 2024-2025
30 pages
Showing Employee Login Date, Logout Date and Status
No ratings yet
Showing Employee Login Date, Logout Date and Status
7 pages

PROJECT 11 For Python

Uploaded by

PROJECT 11 For Python

Uploaded by

Azure End-to-End Data Engineering Project

What is Data Engineering?

1. Ingestion: This phase involves gathering data from various sources.

The Data Architecture

Creating a Connection Between SQL Server and Azure Data Factory

from pyspark.sql.functions import from_utc_timestamp, date_format

for col in df.columns:

output_path = '/mnt/silver/dbo/' + i + '/'

for name in table_name:

# Get the list of column names

for old_col_name in column_names:

# Change the column name using withColumnRenamed

output_path = '/mnt/gold/dbo/' + name + '/'

CREATE OR ALTER PROC CreateSQLServerlessView_gold @Viewname nvarchar(100)

DECLARE @statement VARCHAR(MAX)

SET @statement = N'CREATE OR ALTER VIEW ' + @Viewname + ' AS

I used my Microsoft account to sign in and connect to the database.

Refer to this video for using Bravo: (https://www.youtube.com/watch?v=WybnTHDl-

You might also like