0% found this document useful (0 votes)

21 views5 pages

Fabric Notes

The document outlines a multi-phase process for ingesting and transforming data within a Fabric Workspace, including steps for loading CSV and SQL data into a Lakehouse, performing data transformations using a notebook, and creating Power BI reports. It also details the creation of an Eventstream for real-time data ingestion from Azure IoT Hubs, setting up alerts with Data Activator, and building a low-code data ingestion pipeline for customer data from a CSV file and a REST API. Each phase is clearly defined with step-by-step instructions for implementation.

Uploaded by

chanakyachandu19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

21 views5 pages

Fabric Notes

Uploaded by

chanakyachandu19

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as TXT, PDF, TXT or read online on Scribd

You are on page 1/ 5

Assume:

- You are in a Fabric Workspace.

- You have a Lakehouse named `my_lakehouse`.
- The raw orders data is in a Delta table `my_lakehouse.tables.orders`.
- The CSV file with product categories is in the `Files` section of
`my_lakehouse` at `/files/product_categories.csv`.
- The SQL database with prices is accessible and you have connection details.

### Phase 1: Ingesting the Product Categories CSV

1. **Navigate to Dataflows Gen2:**
- In your Fabric workspace, click on `New` -\> `Dataflow Gen2`.
2. **Add the CSV as a Source:**
- Click `Get Data` -\> `Text/CSV`.
- Browse to your Lakehouse, navigate to the `Files` section, and select
`product_categories.csv`.
- Review the data in the preview window. Ensure the headers are correct.
3. **Define the Destination:**
- In the Dataflow Gen2 editor, on the bottom right, find `Destination` and
select `Lakehouse`.
- Choose your `my_lakehouse`.
- Specify a new table name, like `product_categories`.
- Click `Publish`.
- **Result:** A new Delta table named `product_categories` will be created in
your Lakehouse.

### Phase 2: Connecting to the SQL Database with a Shortcut

1. **Navigate to your Lakehouse:**
- Go back to your `my_lakehouse`.
2. **Create a New Shortcut:**
- Click the `...` next to the `Tables` folder and select `New shortcut`.
- Choose `Microsoft OneLake` (to create a shortcut to the SQL DB).
- Provide the connection details for your SQL database. You'll need the
server name, database name, and credentials.
- Give the shortcut a name, for example, `prices_db`.
- **Result:** A new folder/object named `prices_db` will appear in your
Lakehouse. You can now access the tables within that SQL database as if they were
part of the Lakehouse.

### Phase 3: Data Transformation in a Notebook

1. **Create a New Notebook:**
- In your Fabric workspace, click `New` -\> `Notebook`.
- Attach the notebook to `my_lakehouse`.
2. **Load the DataFrames:**
- In the first cell, you'll write Python/PySpark code to load the three
sources into DataFrames.

```python
# Load the orders data
orders_df = spark.read.format("delta").table("my_lakehouse.tables.orders")
# Load the product categories data
categories_df =
spark.read.format("delta").table("my_lakehouse.tables.product_categories")
# Load the prices data from the SQL shortcut (adjust table name as needed)
prices_df =
spark.read.format("delta").table("my_lakehouse.tables.prices_db_tablename")
```
3. **Perform the Joins and Calculations:**
- In a new cell, write the transformation logic.

```python
from pyspark.sql.functions import col, sum, current_date, date_sub
# Join orders and categories
joined_df = orders_df.join(categories_df, "product_id")
# Join with prices
final_df = joined_df.join(prices_df, "product_id")
# Calculate sales and filter by date
three_months_ago = date_sub(current_date(), 90)
sales_df = final_df.filter(col("order_date") >= three_months_ago) \
.withColumn("total_sales", col("quantity") * col("price"))
# Aggregate by category
summary_df = sales_df.groupBy("product_category") \
.agg(sum("total_sales").alias("total_sales_per_category"))
```
4. **Save the Result:**
- In the final cell, write the code to save the aggregated DataFrame back to
the Lakehouse.

```python

summary_df.write.format("delta").mode("overwrite").saveAsTable("my_lakehouse.tables
.quarterly_sales_summary")
```
- **Result:** A new Delta table `quarterly_sales_summary` is now available in
your Lakehouse, containing the final aggregated data.

### Phase 4: Building the Power BI Report

1. **Navigate to the Lakehouse Data:**
- Go back to your `my_lakehouse`.
- In the `Tables` section, you'll see your `quarterly_sales_summary` table.
2. **Create a New Power BI Report:**
- Hover over the `quarterly_sales_summary` table and click the `...` icon.
- Select `New Power BI report`.
3. **Design the Report:**
- The Power BI editor will open with your `quarterly_sales_summary` table
already loaded as the data source.
- In the `Visualizations` pane, select a `Bar chart`.
- Drag `product_category` to the X-axis.
- Drag `total_sales_per_category` to the Y-axis.
4. **Save and Publish:**
- Click `File` -\> `Save` and give your report a name (e.g., "Quarterly Sales
Report").
- The report is automatically saved to your Fabric workspace and is ready to
be viewed and shared with others.
===================================================================================
===================================================================================
===================================================================================
=================
### Step 1: Create an Eventstream for Ingestion
1. In your Fabric workspace, click **New**.
2. Select **Eventstream**.
3. Give it a name (e.g., `TruckTelemetryStream`) and click **Create**.
4. In the Eventstream editor, find the **New Source** button on the top left and
click it.
5. Select **Azure IoT Hubs**. A wizard will appear.
6. Fill in your connection details for the IoT Hub and click **Add**.
### Step 2: Ingest Data into a KQL Database
1. While still in your Eventstream editor, find the **New Destination** button on
the right and click it.
2. Select **KQL Database**.
3. A pane will open. Select a **KQL Database** you want to use, or create a new
one.
4. Provide a **Table name** (e.g., `truck_telemetry`) and click **Add and
configure**. The data is now flowing into this KQL database.
### Step 3: Set up Real-Time Alerting
1. Go back to your Fabric workspace homepage.
2. Open the KQL Database you created in the previous step.
3. Click on the **Explore your data** button. This will open a new query editor.
4. In the editor, write a query to find trucks that exceed the temperature
threshold. A simple example would be:
```kql
truck_telemetry
| where temperature > 50
| summarize count() by truckId, bin(ingestion_time(), 30s)
| where count_ > 1
```
5. Run the query to test it.
6. In the menu at the top of the query editor, click on **Build Power BI report**.
7. This will create a Power BI report with your query results. In Power BI, you
can set up a data alert on this visual to trigger an email or other action when the
result count is greater than zero.
### Step 4: Configure Historical Data Storage
1. Navigate back to your KQL Database's main page.
2. On the left-hand menu, click on the **Data (Preview)** tab.
3. In the top menu, click on **New connection**.
4. Select **OneLake**.
5. Choose your target **Lakehouse** and a **Table name** (e.g.,
`historical_telemetry`).
6. Click **Create**. Fabric will now automatically export the data from the KQL
database to the Lakehouse as a Delta table.
### Step 5: Create a Power BI Report for Historical Analysis
1. Go to your Fabric workspace and open your **Lakehouse**.
2. In the **Tables** section, find the `historical_telemetry` table you just
created.
3. Hover over the table name, click the three dots **...**, and select **New Power
BI report**.
4. This will open the Power BI editor with your historical data already connected.
5. In the **Visualizations** pane, drag and drop the fields you want to analyze
(e.g., `temperature`, `speed`, `timestamp`) to build your report.
6. Click **Save** and give your report a name.
===================================================================================
===================================================================================
===================================================================================
=================
===================================================================================
===================================================================================
===================================================================================
=================
Here are the step-by-step instructions to set up the alert:
### Step 1: Create a Data Activator and Connect to Your KQL Database
1. In your Fabric workspace, click **New** -> **Data Activator**.
2. Give it a name and click **Create**.
3. In the Data Activator editor, click **Connect to your data**.
4. Select your **KQL Database** from the list of available items.
5. Choose the table containing your clickstream data (e.g., `clickstream_data`).
6. The Data Activator will automatically display a preview of your data stream.
### Step 2: Define the Object and Condition
1. On the left-hand side, find the **Objects** pane. Click **Add an object**.
2. This is where you define what you're tracking. In this case, you're tracking a
specific web page. Select the `page_url` field from your data. Data Activator will
now create a visual representation of each unique `page_url` as an object.
3. On the right-hand side, find the **Triggers** pane. Click **Create a new
trigger**.
4. In the trigger configuration, you will define the condition. For the "Value to
check," select the `page_url` field.
5. Set the condition to be `is equal to` the specific URL:
`/products/bestseller_item_1`.
### Step 3: Configure the Alert Action
1. While in the same trigger configuration pane, scroll down to the "When this
happens" section.
2. Here, you will set the aggregation. Select **Count** to count the number of
views.
3. Set the time window to **1 minute**.
4. Set the condition to `is greater than` the value **100**.
5. In the "Then do this" section, click **Add an action**.
6. Choose the action you want to take, such as **Send a Teams notification** or
**Send an email**.
7. Fill in the details for the recipient (e.g., the marketing team's email
address).
### Step 4: Start the Alert
1. After configuring the trigger and action, click the **Start** button at the top
of the Data Activator ribbon
2. Data Activator will now actively monitor your KQL Database in real time.
Whenever the count of views for that specific product page exceeds 100 within any
1-minute window, it will automatically send the configured notification to the
marketing team.
===================================================================================
===================================================================================
===================================================================================
===================================================================================
===================================================================================
===================================================================================
==================================
Your data engineering team needs to ingest customer data from two different
sources: a **CSV file** and a **public REST API**. The REST API requires
**pagination** to retrieve all records (it returns a maximum of 100 records per
page).
You need to combine this data, perform some basic transformations (like cleaning up
columns), and load the final, combined dataset into a Delta table in your Fabric
**Lakehouse** for downstream reporting.
**Question:** What is the best-suited Fabric component for this task, and what are
the main steps you would follow to build this low-code data ingestion and
transformation pipeline?

Here are the step-by-step instructions to build the pipeline:

### Step 1: Create the Dataflow Gen2
1. In your Fabric workspace, click **New**.
2. Select **More options** and then choose **Dataflow Gen2**.
3. Give the dataflow a name (e.g., `CustomerDataFlow`) and click **Create**. The
Power Query Online editor will open.

### Step 2: Get Data from the REST API

1. In the Power Query editor, click **Get data**.
2. Select **More** and search for **Web API**. Click **Connect**.
3. Enter the URL for your REST API. This is where you would also handle the
**pagination**. In the editor, you can use Power Query M functions to create a
custom function that loops through pages of data by automatically updating a page
number parameter or following a "next page" URL. This is a key feature that lets
you ingest all records without manual intervention.
4. After the data loads, the Power Query editor will show you a preview of the API
data.

### Step 3: Get Data from the CSV File

1. While in the same Dataflow editor, click **Get data** again.
2. Select **Text/CSV**.
3. Enter the path to your CSV file (e.g., from a OneDrive or SharePoint location),
configure the connection, and click **Create**.
4. The CSV data will load as a separate query in the editor. You now have two
separate queries: one for your API data and one for your CSV data.

### Step 4: Clean, Transform, and Combine the Data

1. In the Power Query editor, you can now apply transformations visually.
* **Clean:** Select a query, and then use the ribbon to remove unnecessary
columns, handle null values, or change data types.
* **Transform:** For instance, you could click a column header, go to the **Add
Column** tab, and select **Custom Column** to create a new calculated field.
2. To combine the two sources, select one of the queries (e.g., the API data).
3. From the **Home** tab, click **Append queries**.
4. Select the CSV query from the dropdown to combine its rows with the current
query. A new query will be created with the combined data.

### Step 5: Set the Destination and Publish

1. With your final, combined query selected, click on **Add data destination** on
the right side.
2. Choose **Lakehouse** as your destination.
3. Select your Lakehouse and provide a **Table name** for the final combined data
(e.g., `customer_data`).
4. Choose the desired update method (e.g., "Replace" or "Append").
5. Click **Save settings**.
6. Finally, click the **Publish** button at the bottom right. Dataflow Gen2 will
then automatically ingest, transform, and load the data into your Lakehouse as a
Delta table.

Fabrics
No ratings yet
Fabrics
13 pages
Introduction To Big Data - Formative Assessment 1 - HFF
No ratings yet
Introduction To Big Data - Formative Assessment 1 - HFF
20 pages
Fabric Interview Guide
No ratings yet
Fabric Interview Guide
7 pages
Work With Microsoft Fabric Lakehouses - Training - Microsoft Learn
No ratings yet
Work With Microsoft Fabric Lakehouses - Training - Microsoft Learn
4 pages
DP 600 Day 1 en 1731207686301
No ratings yet
DP 600 Day 1 en 1731207686301
41 pages
Microsoft Fabric
No ratings yet
Microsoft Fabric
33 pages
Business - Requirements 2nd Project
No ratings yet
Business - Requirements 2nd Project
6 pages
Code Explanation
No ratings yet
Code Explanation
3 pages
MicrosoftFabric Training
No ratings yet
MicrosoftFabric Training
16 pages
Azure de and Fabric de Full Edited
No ratings yet
Azure de and Fabric de Full Edited
7 pages
MS Fabric Ultimate Guide
No ratings yet
MS Fabric Ultimate Guide
61 pages
DP 600t00a Enu Powerpoint 02
No ratings yet
DP 600t00a Enu Powerpoint 02
30 pages
Fabric Data Warehouse
50% (2)
Fabric Data Warehouse
280 pages
ETL Question and Answers
No ratings yet
ETL Question and Answers
6 pages
DP-600 Updated Dumps - Microsoft Fabric Analytics Engineer
No ratings yet
DP-600 Updated Dumps - Microsoft Fabric Analytics Engineer
17 pages
Spark Optimization Techniques
No ratings yet
Spark Optimization Techniques
7 pages
Genbrooks Project Description
No ratings yet
Genbrooks Project Description
1 page
Lakehouse End-To-End Scenario - Overview and Architecture - Microsoft Fabric - Microsoft Learn
No ratings yet
Lakehouse End-To-End Scenario - Overview and Architecture - Microsoft Fabric - Microsoft Learn
8 pages
Abhishek Databricks Report Lakehouse Architecture
No ratings yet
Abhishek Databricks Report Lakehouse Architecture
11 pages
Fabric Data Warehouse
No ratings yet
Fabric Data Warehouse
519 pages
Azure Databricks
No ratings yet
Azure Databricks
5 pages
4
No ratings yet
4
2 pages
Snowpark For Python
No ratings yet
Snowpark For Python
5 pages
Fabric Data Warehouse
No ratings yet
Fabric Data Warehouse
686 pages
End To End Project ADF
No ratings yet
End To End Project ADF
73 pages
# Dash Dashboard Guide
No ratings yet
# Dash Dashboard Guide
7 pages
2
No ratings yet
2
2 pages
3
No ratings yet
3
2 pages
DP700 Dumps 4
No ratings yet
DP700 Dumps 4
51 pages
Fabric Data Warehouse
No ratings yet
Fabric Data Warehouse
477 pages
Sales Analysis Using Python and SQL
No ratings yet
Sales Analysis Using Python and SQL
15 pages
I&A Tech Solution Architecture Guidelines
No ratings yet
I&A Tech Solution Architecture Guidelines
321 pages
Code Logic
No ratings yet
Code Logic
6 pages
Cubes - Lightweight Python OLAP Framework
100% (1)
Cubes - Lightweight Python OLAP Framework
103 pages
Introduction - Training - Microsoft Learn
No ratings yet
Introduction - Training - Microsoft Learn
1 page
SAP Data Warehouse Cloud Welcome Guide
No ratings yet
SAP Data Warehouse Cloud Welcome Guide
30 pages
5
No ratings yet
5
2 pages
Solutions For Data Warehousing 7
No ratings yet
Solutions For Data Warehousing 7
18 pages
DP-600 Updated Dumps - Microsoft Fabric Analytics Engineer
No ratings yet
DP-600 Updated Dumps - Microsoft Fabric Analytics Engineer
14 pages
Microsoft Fabric BootCamp
No ratings yet
Microsoft Fabric BootCamp
32 pages
Azure Data Engineer Interview Questions - Part 1
No ratings yet
Azure Data Engineer Interview Questions - Part 1
19 pages
Microsoft Fabric Terminology
No ratings yet
Microsoft Fabric Terminology
5 pages
PROJECT 3 For Python
No ratings yet
PROJECT 3 For Python
23 pages
Spark Optimization Case Study Cleaned
No ratings yet
Spark Optimization Case Study Cleaned
7 pages
Hands On Lab Guide For Data Lake PDF
No ratings yet
Hands On Lab Guide For Data Lake PDF
19 pages
Fabric Slides
No ratings yet
Fabric Slides
20 pages
Let
No ratings yet
Let
12 pages
1
No ratings yet
1
2 pages
Data Integration & Modeling Guide
No ratings yet
Data Integration & Modeling Guide
27 pages
#1. Workspace Settings in Fabric For Data Engineers (DP700 Exam Preparation)
No ratings yet
#1. Workspace Settings in Fabric For Data Engineers (DP700 Exam Preparation)
34 pages
Unit 5
No ratings yet
Unit 5
5 pages
Production Planner
No ratings yet
Production Planner
4 pages
Standard NF Workbook 23G31
No ratings yet
Standard NF Workbook 23G31
128 pages
Azure Data Engineering Project Part 1
No ratings yet
Azure Data Engineering Project Part 1
41 pages
Azure Resource Group & SQL Setup Guide
No ratings yet
Azure Resource Group & SQL Setup Guide
73 pages
Power BI Data Integration Guide
No ratings yet
Power BI Data Integration Guide
16 pages
DHW Ex4 EX5 & EX6
No ratings yet
DHW Ex4 EX5 & EX6
8 pages
Mingundengeyho
No ratings yet
Mingundengeyho
4 pages
Small, Medium & Corporate Commercia
No ratings yet
Small, Medium & Corporate Commercia
1 page
With Exception
No ratings yet
With Exception
1 page
This Text Will Be Copied - This Ext W
No ratings yet
This Text Will Be Copied - This Ext W
1 page
Devops
No ratings yet
Devops
5 pages
Artificial Intelligence Assignment
70% (10)
Artificial Intelligence Assignment
5 pages
CV Varsha Gupta 2 (1) (1) .7 Years Exp
No ratings yet
CV Varsha Gupta 2 (1) (1) .7 Years Exp
4 pages
TJR TUJR WF4 Manual 01 25 15
No ratings yet
TJR TUJR WF4 Manual 01 25 15
62 pages
IBM POST & BIOS Error Codes Guide
No ratings yet
IBM POST & BIOS Error Codes Guide
4 pages
Normalisasi Database
No ratings yet
Normalisasi Database
25 pages
Mist Edge
No ratings yet
Mist Edge
2 pages
VLSI Testing - DFT and Scan
No ratings yet
VLSI Testing - DFT and Scan
35 pages
vm51616H - Video - Matrix - Switch - Ds - en
No ratings yet
vm51616H - Video - Matrix - Switch - Ds - en
3 pages
ISO (International Organization Standardization)
100% (1)
ISO (International Organization Standardization)
18 pages
DSP LAB Manual - ECE - KNCET
No ratings yet
DSP LAB Manual - ECE - KNCET
60 pages
DLL Arts Q2 W2 D3 Nov 16
No ratings yet
DLL Arts Q2 W2 D3 Nov 16
6 pages
Encrypted Text Analysis
No ratings yet
Encrypted Text Analysis
77 pages
Project 12
No ratings yet
Project 12
44 pages
CORVETTE 14L PV 200813 1510 Locked
No ratings yet
CORVETTE 14L PV 200813 1510 Locked
85 pages
Diagnostic Table For Yanmar 4TNV98 ZNMS Tier 3 Engine
100% (1)
Diagnostic Table For Yanmar 4TNV98 ZNMS Tier 3 Engine
3 pages
Solution HW4
No ratings yet
Solution HW4
5 pages
IQAN-MD4 Instructionbook UK
No ratings yet
IQAN-MD4 Instructionbook UK
45 pages
Corp Internet Banking FAQs
No ratings yet
Corp Internet Banking FAQs
2 pages
Bus Naming On Xilinx Schematics PDF
No ratings yet
Bus Naming On Xilinx Schematics PDF
3 pages
Paas Under The Hood Printversion
No ratings yet
Paas Under The Hood Printversion
23 pages
MYH Case Study
75% (16)
MYH Case Study
62 pages
Professional 2019: Fire Detection and Voice Evacuation Systems
No ratings yet
Professional 2019: Fire Detection and Voice Evacuation Systems
76 pages
KKS Power Plant Identification System
No ratings yet
KKS Power Plant Identification System
3 pages
Android App Development Exercises
No ratings yet
Android App Development Exercises
89 pages
PHPIPAM 1.2.1 Multiple Vulnerabilities
No ratings yet
PHPIPAM 1.2.1 Multiple Vulnerabilities
4 pages
MAX1737 Stand-Alone Switch-Mode Lithium-Ion Battery-Charger Controller
No ratings yet
MAX1737 Stand-Alone Switch-Mode Lithium-Ion Battery-Charger Controller
42 pages
Syllabus IST 8105-Spring 2024
No ratings yet
Syllabus IST 8105-Spring 2024
10 pages
Monica Grover's Resume
No ratings yet
Monica Grover's Resume
2 pages
FRST
No ratings yet
FRST
19 pages
Loxone Compendium Building Automation
No ratings yet
Loxone Compendium Building Automation
44 pages

Fabric Notes

Uploaded by

Fabric Notes

Uploaded by

**Assume:**

- You are in a Fabric Workspace.

### Phase 1: Ingesting the Product Categories CSV

### Phase 2: Connecting to the SQL Database with a Shortcut

### Phase 3: Data Transformation in a Notebook

### Phase 4: Building the Power BI Report

Here are the step-by-step instructions to build the pipeline:

### Step 2: Get Data from the REST API

### Step 3: Get Data from the CSV File

### Step 4: Clean, Transform, and Combine the Data

### Step 5: Set the Destination and Publish

You might also like

Assume: