Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
21 views5 pages

Fabric Notes

The document outlines a multi-phase process for ingesting and transforming data within a Fabric Workspace, including steps for loading CSV and SQL data into a Lakehouse, performing data transformations using a notebook, and creating Power BI reports. It also details the creation of an Eventstream for real-time data ingestion from Azure IoT Hubs, setting up alerts with Data Activator, and building a low-code data ingestion pipeline for customer data from a CSV file and a REST API. Each phase is clearly defined with step-by-step instructions for implementation.

Uploaded by

chanakyachandu19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
21 views5 pages

Fabric Notes

The document outlines a multi-phase process for ingesting and transforming data within a Fabric Workspace, including steps for loading CSV and SQL data into a Lakehouse, performing data transformations using a notebook, and creating Power BI reports. It also details the creation of an Eventstream for real-time data ingestion from Azure IoT Hubs, setting up alerts with Data Activator, and building a low-code data ingestion pipeline for customer data from a CSV file and a REST API. Each phase is clearly defined with step-by-step instructions for implementation.

Uploaded by

chanakyachandu19
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as TXT, PDF, TXT or read online on Scribd
You are on page 1/ 5

**Assume:**

- You are in a Fabric Workspace.


- You have a Lakehouse named `my_lakehouse`.
- The raw orders data is in a Delta table `my_lakehouse.tables.orders`.
- The CSV file with product categories is in the `Files` section of
`my_lakehouse` at `/files/product_categories.csv`.
- The SQL database with prices is accessible and you have connection details.

### Phase 1: Ingesting the Product Categories CSV


1. **Navigate to Dataflows Gen2:**
- In your Fabric workspace, click on `New` -\> `Dataflow Gen2`.
2. **Add the CSV as a Source:**
- Click `Get Data` -\> `Text/CSV`.
- Browse to your Lakehouse, navigate to the `Files` section, and select
`product_categories.csv`.
- Review the data in the preview window. Ensure the headers are correct.
3. **Define the Destination:**
- In the Dataflow Gen2 editor, on the bottom right, find `Destination` and
select `Lakehouse`.
- Choose your `my_lakehouse`.
- Specify a new table name, like `product_categories`.
- Click `Publish`.
- **Result:** A new Delta table named `product_categories` will be created in
your Lakehouse.

### Phase 2: Connecting to the SQL Database with a Shortcut


1. **Navigate to your Lakehouse:**
- Go back to your `my_lakehouse`.
2. **Create a New Shortcut:**
- Click the `...` next to the `Tables` folder and select `New shortcut`.
- Choose `Microsoft OneLake` (to create a shortcut to the SQL DB).
- Provide the connection details for your SQL database. You'll need the
server name, database name, and credentials.
- Give the shortcut a name, for example, `prices_db`.
- **Result:** A new folder/object named `prices_db` will appear in your
Lakehouse. You can now access the tables within that SQL database as if they were
part of the Lakehouse.

### Phase 3: Data Transformation in a Notebook


1. **Create a New Notebook:**
- In your Fabric workspace, click `New` -\> `Notebook`.
- Attach the notebook to `my_lakehouse`.
2. **Load the DataFrames:**
- In the first cell, you'll write Python/PySpark code to load the three
sources into DataFrames.
<!-- end list -->
```python
# Load the orders data
orders_df = spark.read.format("delta").table("my_lakehouse.tables.orders")
# Load the product categories data
categories_df =
spark.read.format("delta").table("my_lakehouse.tables.product_categories")
# Load the prices data from the SQL shortcut (adjust table name as needed)
prices_df =
spark.read.format("delta").table("my_lakehouse.tables.prices_db_tablename")
```
3. **Perform the Joins and Calculations:**
- In a new cell, write the transformation logic.
<!-- end list -->
```python
from pyspark.sql.functions import col, sum, current_date, date_sub
# Join orders and categories
joined_df = orders_df.join(categories_df, "product_id")
# Join with prices
final_df = joined_df.join(prices_df, "product_id")
# Calculate sales and filter by date
three_months_ago = date_sub(current_date(), 90)
sales_df = final_df.filter(col("order_date") >= three_months_ago) \
.withColumn("total_sales", col("quantity") * col("price"))
# Aggregate by category
summary_df = sales_df.groupBy("product_category") \
.agg(sum("total_sales").alias("total_sales_per_category"))
```
4. **Save the Result:**
- In the final cell, write the code to save the aggregated DataFrame back to
the Lakehouse.
<!-- end list -->
```python

summary_df.write.format("delta").mode("overwrite").saveAsTable("my_lakehouse.tables
.quarterly_sales_summary")
```
- **Result:** A new Delta table `quarterly_sales_summary` is now available in
your Lakehouse, containing the final aggregated data.

### Phase 4: Building the Power BI Report


1. **Navigate to the Lakehouse Data:**
- Go back to your `my_lakehouse`.
- In the `Tables` section, you'll see your `quarterly_sales_summary` table.
2. **Create a New Power BI Report:**
- Hover over the `quarterly_sales_summary` table and click the `...` icon.
- Select `New Power BI report`.
3. **Design the Report:**
- The Power BI editor will open with your `quarterly_sales_summary` table
already loaded as the data source.
- In the `Visualizations` pane, select a `Bar chart`.
- Drag `product_category` to the X-axis.
- Drag `total_sales_per_category` to the Y-axis.
4. **Save and Publish:**
- Click `File` -\> `Save` and give your report a name (e.g., "Quarterly Sales
Report").
- The report is automatically saved to your Fabric workspace and is ready to
be viewed and shared with others.
===================================================================================
===================================================================================
===================================================================================
=================
### Step 1: Create an Eventstream for Ingestion
1. In your Fabric workspace, click **New**.
2. Select **Eventstream**.
3. Give it a name (e.g., `TruckTelemetryStream`) and click **Create**.
4. In the Eventstream editor, find the **New Source** button on the top left and
click it.
5. Select **Azure IoT Hubs**. A wizard will appear.
6. Fill in your connection details for the IoT Hub and click **Add**.
### Step 2: Ingest Data into a KQL Database
1. While still in your Eventstream editor, find the **New Destination** button on
the right and click it.
2. Select **KQL Database**.
3. A pane will open. Select a **KQL Database** you want to use, or create a new
one.
4. Provide a **Table name** (e.g., `truck_telemetry`) and click **Add and
configure**. The data is now flowing into this KQL database.
### Step 3: Set up Real-Time Alerting
1. Go back to your Fabric workspace homepage.
2. Open the KQL Database you created in the previous step.
3. Click on the **Explore your data** button. This will open a new query editor.
4. In the editor, write a query to find trucks that exceed the temperature
threshold. A simple example would be:
```kql
truck_telemetry
| where temperature > 50
| summarize count() by truckId, bin(ingestion_time(), 30s)
| where count_ > 1
```
5. Run the query to test it.
6. In the menu at the top of the query editor, click on **Build Power BI report**.
7. This will create a Power BI report with your query results. In Power BI, you
can set up a data alert on this visual to trigger an email or other action when the
result count is greater than zero.
### Step 4: Configure Historical Data Storage
1. Navigate back to your KQL Database's main page.
2. On the left-hand menu, click on the **Data (Preview)** tab.
3. In the top menu, click on **New connection**.
4. Select **OneLake**.
5. Choose your target **Lakehouse** and a **Table name** (e.g.,
`historical_telemetry`).
6. Click **Create**. Fabric will now automatically export the data from the KQL
database to the Lakehouse as a Delta table.
### Step 5: Create a Power BI Report for Historical Analysis
1. Go to your Fabric workspace and open your **Lakehouse**.
2. In the **Tables** section, find the `historical_telemetry` table you just
created.
3. Hover over the table name, click the three dots **...**, and select **New Power
BI report**.
4. This will open the Power BI editor with your historical data already connected.
5. In the **Visualizations** pane, drag and drop the fields you want to analyze
(e.g., `temperature`, `speed`, `timestamp`) to build your report.
6. Click **Save** and give your report a name.
===================================================================================
===================================================================================
===================================================================================
=================
===================================================================================
===================================================================================
===================================================================================
=================
Here are the step-by-step instructions to set up the alert:
### Step 1: Create a Data Activator and Connect to Your KQL Database
1. In your Fabric workspace, click **New** -> **Data Activator**.
2. Give it a name and click **Create**.
3. In the Data Activator editor, click **Connect to your data**.
4. Select your **KQL Database** from the list of available items.
5. Choose the table containing your clickstream data (e.g., `clickstream_data`).
6. The Data Activator will automatically display a preview of your data stream.
### Step 2: Define the Object and Condition
1. On the left-hand side, find the **Objects** pane. Click **Add an object**.
2. This is where you define what you're tracking. In this case, you're tracking a
specific web page. Select the `page_url` field from your data. Data Activator will
now create a visual representation of each unique `page_url` as an object.
3. On the right-hand side, find the **Triggers** pane. Click **Create a new
trigger**.
4. In the trigger configuration, you will define the condition. For the "Value to
check," select the `page_url` field.
5. Set the condition to be `is equal to` the specific URL:
`/products/bestseller_item_1`.
### Step 3: Configure the Alert Action
1. While in the same trigger configuration pane, scroll down to the "When this
happens" section.
2. Here, you will set the aggregation. Select **Count** to count the number of
views.
3. Set the time window to **1 minute**.
4. Set the condition to `is greater than` the value **100**.
5. In the "Then do this" section, click **Add an action**.
6. Choose the action you want to take, such as **Send a Teams notification** or
**Send an email**.
7. Fill in the details for the recipient (e.g., the marketing team's email
address).
### Step 4: Start the Alert
1. After configuring the trigger and action, click the **Start** button at the top
of the Data Activator ribbon
2. Data Activator will now actively monitor your KQL Database in real time.
Whenever the count of views for that specific product page exceeds 100 within any
1-minute window, it will automatically send the configured notification to the
marketing team.
===================================================================================
===================================================================================
===================================================================================
===================================================================================
===================================================================================
===================================================================================
==================================
Your data engineering team needs to ingest customer data from two different
sources: a **CSV file** and a **public REST API**. The REST API requires
**pagination** to retrieve all records (it returns a maximum of 100 records per
page).
You need to combine this data, perform some basic transformations (like cleaning up
columns), and load the final, combined dataset into a Delta table in your Fabric
**Lakehouse** for downstream reporting.
**Question:** What is the best-suited Fabric component for this task, and what are
the main steps you would follow to build this low-code data ingestion and
transformation pipeline?

Here are the step-by-step instructions to build the pipeline:


### Step 1: Create the Dataflow Gen2
1. In your Fabric workspace, click **New**.
2. Select **More options** and then choose **Dataflow Gen2**.
3. Give the dataflow a name (e.g., `CustomerDataFlow`) and click **Create**. The
Power Query Online editor will open.

### Step 2: Get Data from the REST API


1. In the Power Query editor, click **Get data**.
2. Select **More** and search for **Web API**. Click **Connect**.
3. Enter the URL for your REST API. This is where you would also handle the
**pagination**. In the editor, you can use Power Query M functions to create a
custom function that loops through pages of data by automatically updating a page
number parameter or following a "next page" URL. This is a key feature that lets
you ingest all records without manual intervention.
4. After the data loads, the Power Query editor will show you a preview of the API
data.

### Step 3: Get Data from the CSV File


1. While in the same Dataflow editor, click **Get data** again.
2. Select **Text/CSV**.
3. Enter the path to your CSV file (e.g., from a OneDrive or SharePoint location),
configure the connection, and click **Create**.
4. The CSV data will load as a separate query in the editor. You now have two
separate queries: one for your API data and one for your CSV data.

### Step 4: Clean, Transform, and Combine the Data


1. In the Power Query editor, you can now apply transformations visually.
* **Clean:** Select a query, and then use the ribbon to remove unnecessary
columns, handle null values, or change data types.
* **Transform:** For instance, you could click a column header, go to the **Add
Column** tab, and select **Custom Column** to create a new calculated field.
2. To combine the two sources, select one of the queries (e.g., the API data).
3. From the **Home** tab, click **Append queries**.
4. Select the CSV query from the dropdown to combine its rows with the current
query. A new query will be created with the combined data.

### Step 5: Set the Destination and Publish


1. With your final, combined query selected, click on **Add data destination** on
the right side.
2. Choose **Lakehouse** as your destination.
3. Select your Lakehouse and provide a **Table name** for the final combined data
(e.g., `customer_data`).
4. Choose the desired update method (e.g., "Replace" or "Append").
5. Click **Save settings**.
6. Finally, click the **Publish** button at the bottom right. Dataflow Gen2 will
then automatically ingest, transform, and load the data into your Lakehouse as a
Delta table.

You might also like