NOIDA INSTITUTE OF ENGINEERING AND
TECHNOLOGY
GREATER NOIDA-201306
(An Autonomous Institute)
School of Computer Sciences & Information Technology
Department of IT
Session (2024 – 2025)
LAB FILE
OF
DATA ANALYTICS
(ACSDS0653)
(6th Semester)
Submitted to: Submitted by:
Ms. Nidhi Chauhan Name:
Roll. No:
Affiliated to Dr. A.P.J Abdul Kalam Technical University, Uttar Pradesh, Lucknow.
INDEX
INDEX
S.No Name of Experiment Date Faculty
Signature
1
Installation of MySQL, Anaconda, and Tableau
To perform data import/export (.CSV, .XLS, .TXT)
2
operations using data frames in R/Python.
To perform data pre-processing operations i) Handling
3
Missing data ii) Min-Max normalization
To perform dimensionality reduction operation using PCA
4
Houses Data Set
To perform statistical operations (Mean, Median, Mode and
5
Standard deviation) using
Tableau – getting started
User interface
Methodology for working with the interface
Connecting to different types of data sources (Excel, csv,
6
Access, MySQL, Tableau Server)
Editing Data Connections and Data Sources; Live mode vs.
Extract mode
Date interpreter / Pivot
Joining multiple datasets
Union / Join
7
Cross database joins
Data Blending – integrating different data source
Basic functionalities
Filtering
Sorting
Grouping
8
Hierarchies
Creating sets
Types of dates – Continuous vs. Discreet
Pivot tables
Dashboards and stories
Building dashboards
9 Dashboard objects
Dashboard formatting
Dashboard extensions Story points
Calculations
Syntax
10 Table calculations
LOD expressions
Aggregate Date, Logic, String, Number, Type calculation
11 Built-in chart types/visualisations:
Line chart
Dot chart
Bar chart
Other types of visualisation (bullet graph, Heat map, Tree
map, etc.).
Combo charts – dual axis
Custom chart types:
KPI matrix
Waterfall
Gantt
12
Dot plot
Pareto
Analytics’ options: trend lines, forecasting, clustering
CREATE AND FORMAT REPORTS USING THE
TABLEAU DESKTOP
Describe the use of Page Backgrounds and Templates
Create visualizations to display the data
13 Apply drill through and drill down
Create and manage slicers with the use of filters.
Explore visual interactions
Review Bookmarks
Publish the report to the Tableau online
Experiment 1
Objective: Installation of MySQL, Anaconda, Tableau
MySQL
MySQL is a relational database management system (RDBMS) that stores and manages data using SQL. It's
open source, meaning anyone can use and modify it.
MySQL is a popular open-source Relational Database Management System (RDBMS) that
uses SQL (Structured Query Language) for database operations. While MySQL is a specific database
system accessible for free and supports various programming languages.
Why Use MySQL
MySQL is a popular choice for managing relational databases for several reasons:
1. Open Source: MySQL is open-source software, which means it’s free to use and has a large
community of developers contributing to its improvement.
2. Relational: MySQL follows the relational database model, allowing users to organize data
into tables with rows and columns, facilitating efficient data storage and retrieval.
3. Reliability: MySQL has been around for a long time and is known for its stability and reliability.
4. Performance: MySQL is optimized for performance, making it capable of handling high-volume
transactions and large datasets efficiently.
5. Scalability: MySQL can scale both vertically and horizontally to accommodate growing data and
user loads. You can add more resources to a single server or distribute the workload across multiple
servers using techniques like sharding or replication.
6. Compatibility: MySQL is widely supported by many programming languages, frameworks, and
tools. It offers connectors and APIs for popular languages like PHP, Python, Java, and more, making it
easy to integrate with your existing software stack.
7. Security: MySQL provides robust security features to protect your data, including access controls,
encryption, and auditing capabilities. With proper configuration, you can ensure that only authorized
users have access to sensitive information.
Hardware and Software Requirements to Install MySQL
Before installing MySQL to your PC, ensure your system has a capable processor (like Intel Core), a
minimum of 4 GB RAM (or 6 GB), a compatible graphics card, and a display with at least 1024×768
resolution.
Download and Install MySQL for Windows Steps
Now, Let’ ‘s break down MySQL software downloading steps for a better understanding and see install
MySQL on Windows 10 step by step.
Step 1: Visit the Official MySQL Website
Open your preferred web browser and navigate to the official MySQL website . Now, Simple click on first
download button.
Step 2: Go to the Downloads Section
On the MySQL homepage, Click on the ” No thanks, just start my download” link to proceed MySql
downloading.
Step 3: Run the Installer
After MySQL downloading MySQL.exe file , go to your Downloads folder, find the file, and double-click
to run the installer.
Step 4: Choose Setup Type
The installer will instruct you to choose the setup type. For most users, the “ Developer Default” is
suitable. Click “Next” to proceed.
Step 5: Check Requirements
You might be prompted to install necessary MySQL software, typically Visual Code. The installer can
auto-resolve some issues, but not in this case.
Step 6: MySQL Downloading
Now that you’re in the download section, click “Execute” to start downloading the components you
selected. Wait a few minutes until all items show tick marks, indicating completion, before moving
forward.
Step 7: MySqL Installation
Now the downloaded components will be installed. Click “Execute” to start the installation process.
MySQL will be installed on your Windows system. Then click Next to proceed
Step 8: Navigate to Few Configuration Pages
Proceed to “Product Configuration” > “Type and Networking” > “Authentication Method” Pages by
clicking the “Next” button.
Step 9: Create MySQL Accounts
Create a password for the MySQL root user. Ensure it’s strong and memorable. Click “ Next” to proceed.
Step 10: Connect To Server
Enter the root password, click Check. If it says “Connection succeed,” you’ve successfully connected to
the server.
Step 11: Complete Installation
Once the installation is complete, click “Finish.” Congratulations! MySQL is now installed on your
Windows system.
Step 12: Verify Installation
To ensure a successful installation of MySQL, open the MySQL Command Line Client or MySQL
Workbench, both available in your Start Menu. Log in using the root user credentials you set during
installation.
MySQL Workbench Is Ready To Use
MySQL is an open-source relational database management system that is based on SQL queries. MySQL
is used for data operations like querying, filtering, sorting, grouping, modifying, and joining the tables
present in the database.
Read more: MySQL and its working
Anaconda
Anaconda is an open-source distribution of the Python and R programming languages. It's used for data
science, machine learning, and artificial intelligence (AI).
Download the Anaconda Installer – 3 Steps
We will start the process by downloading the Anaconda Installer by following these three simple steps as
mentioned below:
Step 1: Visit the Official Website
Head over to anaconda.com and install the latest version of Anaconda. Make sure to download the “Python
3.13.1 Version” for the appropriate architecture.
Step 2: Select the Windows Installer
Choose the appropriate installer (based on your system’s architecture) i.e. 64-bit (for modern systems) or
32-bit (for older systems).
Step 3: Start the Downloading Process
Select the location where you want to save the file and click “Save” to start the downloading process.
Run the Anaconda Installer – 8 Steps
Once the installation is completed, now we will see how to setup Ananconda Installer in
your Windows PC. Let’s check it out:
Step 1: Begin with the installation process
Navigate to the downloaded file, make a double-click on the .exe file and start the installation process.
Step 2: Getting through the License Agreement
Follow the on-screen instructions, read the license terms & agreement and proceed ahead.
Terms & License
Step 3: Select Installation Type
Select Just Me if you want the software to be used by a single User else you can select All Users (for all
users on the system)
Step 4: Choose Installation Location
Select the path where you wish to install the file extractor and click “Next” to proceed ahead.
Anaconda Setup .exe
Step 5: Advanced Installation Option
Choose whether to add Anaconda to your system PATH environment variable. Also note that adding
Anaconda to the PATH can interfere with other software and decide whether to register Anaconda as the
default Python.
Advanced Installation Options – Anaconda
Step 6: Getting through the Installation Process
Click Install to start the Anaconda Installation process.
Step 7: Recommendation to Install Pycharm
During the installation, it will ask you to visit the official site to get PyCharm in your Windows PC. You
can choose to visit or may skip as well.
PyCharm recommended
Step 8: Finishing up the Installation
Once the installation gets complete, click Finish to complete the process.
Verify the Installation
Once the installation gets completed, you may verify the installation by following these steps.
Step 1: Access to Anaconda Prompt
Click on the Start Menu and search for “Anaconda Prompt” and click to open it.
Anaconda Prompt
Step 2: Run the Program to check for the Anaconda Version
Type the following command to check for the installed version of conda:
conda --version
Working with Anaconda
Once the downloading, installing and verification of the Anaconda for Windows PC is competed, you
can start exploring and get familiar with its features.
Step 1: Access Anaconda Navigator
Once the installation process is done, Anaconda can be used to perform multiple operations. To begin
using Anaconda, search for Anaconda Navigator from the Start Menu in Windows PC.
Step 2: Explore Navigator & Features
You can use navigator to create new environments , install packages, and launch applications without
using the command line.
Tableau
Tableau is a visual analytics platform that helps people and organizations use data to solve problems. It's a
tool that allows users to access, analyze, and visualize data.
Download and Installation of Tableau
Tableau is available in two ways:-
o Tableau Public (Free)
o Tableau Desktop (Commercial)
Here is a comparison between the Tableau Public and Tableau Desktop
Tableau Public
o Tableau Public is a free and open-source.
o Tableau public data source can connect to Excel and Text files.
o Tableau public can be installed on Window and Mac operating system.
o Data and Visualizations are not secured in the Tableau public because it is available in public.
o In Tableau public, data cannot be obtained from different data sources as it is limited to connect only
Excel and Text files.
o Tableau public uses the details at Personal level.
Tableau Desktop
o Tableau Desktop is a paid source, personal edition- $35 per month and professional edition- $70 per
month.
o Tableau desktop data source can connect to any data source file, including databases, web
applications, and more.
o Tableau desktop can also install on Window and Mac operating system.
o Data and Visualization are secured in Tableau desktop.
o In Tableau desktop, data can extract from various data sources and stored as Tableau extract file.
o Tableau desktop uses the details at Professional and Enterprise level.
Lets install the Tableau Desktop on Window machine and go through step by step:-
Step1:- Go to https://www.tableau.com/products/desktop on your Web browser.
Backward Skip 10sPlay VideoForward Skip 10s
Step2:- Click on the 'Try Now' button.
Step3:- Now, enter your Email id and click on the 'Download Free Trial' button.
Step4:- This will start downloading the .exe File for window machine by default.
Step5:- Open the download file, and click on the 'Run' button.
Step6:- Accept the terms and condition and click on 'Install' button.
Step7:- A pop message will be shown on the screen to get the approval of the administrator to install the
Tableau software. Click on 'yes' to approve it than installation will be started.
Step8:- Once the installation is completed, then open the Tableau desktop software.
Step9:- In the registration window
1. Click on Activate Tableau and fill your complete details.
2. Click on start trial now.
Step10:- Wait for complete registration.
Step11:- Start screen
of the Tableau Desktop.
Experiment 2
Objective: To perform data import/export (.CSV, .XLS, .TXT) operations using data frames in
R/Python.
1. Importing Data
import pandas as pd
# CSV File
df_csv = pd.read_csv("data.csv")
# Excel File (.xls or .xlsx)
df_excel = pd.read_excel("data.xlsx")
# TXT File (tab or custom-delimited)
df_txt = pd.read_csv("data.txt", delimiter="\t") # or use sep=',' for comma
2. Exporting Data
# To CSV
df_csv.to_csv("output.csv", index=False)
# To Excel
df_csv.to_excel("output.xlsx", index=False)
# To TXT
df_csv.to_csv("output.txt", sep='\t', index=False)
OUTPUT:-
Experiment 3
Objective: To perform data pre-processing operations i) Handling Missing data ii) Min-Max
normalization
1.Handling Missing Data
Identify Missing Values: Detect columns with missing values.
Choose Imputation Strategy:
Numerical Data: Replace missing values with the mean (for normal distributions) or median (for skewed
distributions).
Categorical Data: Replace missing values with the mode (most frequent category).
Impute Missing Values:
Use pandas for basic imputation or scikit-learn for pipeline integration.
import pandas as pd
from sklearn.impute import SimpleImputer
# Load dataset
df = pd.read_csv('data.csv')
# Separate numerical and categorical columns (example)
num_cols = df.select_dtypes(include=['int64', 'float64']).columns
cat_cols = df.select_dtypes(include=['object']).columns
# Impute numerical columns with mean
num_imputer = SimpleImputer(strategy='mean')
df[num_cols] = num_imputer.fit_transform(df[num_cols])
# Impute categorical columns with mode
cat_imputer = SimpleImputer(strategy='most_frequent')
df[cat_cols] = cat_imputer.fit_transform(df[cat_cols])
# Alternatively, using pandas:
# df[num_cols] = df[num_cols].fillna(df[num_cols].mean())
# df[cat_cols] = df[cat_cols].fillna(df[cat_cols].mode().iloc[0])
Original Dataset (Before Preprocessing)
Age Income Gender
25 50000 Male
NaN 60000 Female
30 NaN NaN
35 70000 Male
NaN 80000 Female
After Imputation:
Ag
Income Gender
e
25 50000.0 Male
30 60000.0 Female
30 65000.0 Male
35 70000.0 Male
30 80000.0 Female
Age: Missing values replaced with the mean (30).
Income: Missing value replaced with the mean (65,000).
Gender: Missing value replaced with the mode (Male, since it appeared most frequently).
ii) Min-Max Normalization
Purpose: Scale numerical features to a range of [0, 1].
Formula:
Implementation:
Use scikit-learn for efficient scaling or compute manually with pandas.
from sklearn.preprocessing import MinMaxScaler
# Initialize scaler
scaler = MinMaxScaler()
# Apply normalization to numerical columns
df[num_cols] = scaler.fit_transform(df[num_cols])
# Manual implementation with pandas:
# df[num_cols] = (df[num_cols] - df[num_cols].min()) / (df[num_cols].max() - df[num_cols].min())
Step 2: Min-Max Normalization
After Scaling:
Age (Normalized) Income (Normalized) Gender
0.00 0.00 Male
0.50 0.33 Female
Age (Normalized) Income (Normalized) Gender
0.50 0.50 Male
1.00 0.67 Male
0.50 1.00 Female
Key Observations:
Missing Data Handling:
Numerical columns (Age, Income) used mean imputation.
Categorical column (Gender) used mode imputation.
Normalization:
Values scaled between [0, 1] (e.g., Age 25 → 0.0, 35 → 1.0).
Categorical Data:
Gender remains unchanged (normalization only applies to numerical features).
Experiment 4:
Objective: To perform dimensionality reduction operation using PCA Houses Data Set
To apply Principal Component Analysis (PCA) for dimensionality reduction on the Houses dataset to
reduce the number of features while preserving maximum variance.
Procedure:
Import required libraries: pandas, numpy, sklearn.decomposition, and matplotlib for visualization.
Load the Houses dataset using pandas.read_csv().
Perform exploratory data analysis:
Check for null values.
Drop or fill missing values as needed.
Normalize the data using StandardScaler.
Apply PCA:
Use sklearn.decomposition.PCA to reduce dimensionality.
Fit PCA on the normalized data and transform it.
Analyze explained variance ratio.
Visualize the results using a 2D or 3D scatter plot of the transformed dataset.
Compare original features vs reduced features to analyze performance.
Sample Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Load dataset
data = pd.read_csv(r"C:\Users\lenovo\Downloads\houses.csv")
# Display the first few rows
print(data.head())
# Apply PCA to reduce to 2 principal components
pca = PCA(n_components=2)
principal_components = pca.fit_transform(scaled_data)
# Convert to DataFrame
pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
# Display the transformed data
pca_df.head()
# Scatter plot for PCA visualization
plt.figure(figsize=(8, 6))
plt.scatter(pca_df['PC1'], pca_df['PC2'], alpha=0.6, color='blue')
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Projection of Housing Data")
plt.grid(True)
plt.show()
explained_variance = pca.explained_variance_ratio_
print("Explained Variance Ratio:", explained_variance)
# Plot variance explained by each principal component
plt.figure(figsize=(6, 4))
plt.bar(range(1, 3), explained_variance, alpha=0.7, color="red")
plt.xlabel("Principal Components")
plt.ylabel("Variance Explained")
plt.title("PCA Explained Variance")
plt.show()
Experiment 5:
Objective: To perform statistical operations (Mean, Median, Mode and Standard deviation) using
Python.
To calculate basic statistical measures such as Mean, Median, Mode, and Standard Deviation on a
given dataset using Python or R.
Procedure:
import pandas as pd
import numpy as np
from statistics import mode # Mode calculation
data = pd.read_csv(r"C:\Users\lenovo\Downloads\TaillePoids.csv")
print(data.head()) # Display first few rows
mean_values = data.mean()
print("Mean:\n", mean_values)
output
median_values = data.median()
print("Median:\n", median_values)
output
mode_values = data.mode().iloc[0] # Mode returns multiple values; take the first row
print("Mode:\n", mode_values)
output
std_values = data.std()
print("Standard Deviation:\n", std_values)
output
Experiment 6
Tableau – getting started
User interface
Methodology for working with the interface
Connecting to different types of data sources (Excel, csv, Access, MySQL, Tableau Server)
Editing Data Connections and Data Sources; Live mode vs. Extract mode
Date interpreter / Pivot
Objective: To get introduced to Tableau software by understanding the user interface, connecting to
different types of data sources, and exploring data connections using Live and Extract modes.
Procedure:
Tableau – Getting Started Guide
1. User Interface Overview
Tableau's interface is designed for intuitive data exploration. Key components include:
Data Pane: Lists data sources, fields (dimensions and measures), and calculations.
Shelves (Rows, Columns, Marks): Drag fields here to build visualizations. The Marks Card controls color,
size, labels, etc.
Toolbar: Tools for sorting, grouping, filtering, and creating calculations.
Sheets/Dashboards/Stories Tabs: Switch between worksheets, dashboards, and story presentations.
Show Me: Recommends chart types based on selected fields.
Worksheet Workspace: Canvas for building visualizations.
2. Methodology for Working with the Interface
Workflow:
Connect to Data: Start by linking to your data source.
Build Data Source: Clean/organize data (e.g., pivot, split fields).
Create Visualizations: Drag dimensions/measures to shelves and use Show Me for chart suggestions.
Refine: Apply filters, sort, or group data as needed.
Build Dashboards/Stories: Combine multiple sheets into interactive dashboards.
Best Practices:
Use folders in the Data Pane to organize fields.
Right-click fields to change data types (e.g., string to date).
Use aliases to rename categorical values for clarity.
3. Connecting to Data Sources
Tableau supports numerous data sources. Here’s how to connect:
Excel/CSV:
Click “Microsoft Excel” or “Text File” > Browse to the file.
Drag sheets/tables to the canvas or use the “New Union” option.
Access:
Requires ODBC driver (pre-installed on Windows). Select “Microsoft Access” > Browse to .accdb.
MySQL:
Choose “MySQL” > Enter server name, port, database, and credentials.
Tableau Server:
Select “Tableau Server” > Enter URL > Log in to access published data sources.
4. Editing Data Connections & Data Sources
Live vs. Extract Mode:
Live: Direct connection to the source (real-time data). Use for small datasets or frequent updates.
Extract (Hyper): Snapshot of data stored in Tableau’s .hyper format. Improves performance for large
datasets; refresh manually or on a schedule.
Editing Connections:
Right-click fields to rename, hide, or create calculated fields.
Use Data Source Page to join tables, pivot columns, or split fields (e.g., splitting "Year-Month" into two
columns).
Adjust joins/relationships in the physical/logical layer for complex data models.
5. Date Interpreter & Pivot
Date Interpreter:
Automatically detects date formats in ambiguous fields (e.g., "2023-Oct-15" vs. "10/15/23").
Right-click a field > Date Properties to adjust interpretation.
Pivot:
Reshape data from wide to long format (e.g., converting columns like "Jan Sales," "Feb Sales" into "Month"
and "Sales" rows).
Select columns > Right-click > Pivot. Rename pivoted fields as needed.
Experiment 7
Joining multiple datasets
Union / Join
Cross database joins
Data Blending – integrating different data source
Objective: To understand and apply different methods of combining multiple datasets in Tableau using
Union, Join, Cross-database Join, and Data Blending.
Procedure:
1. Union vs. Join
Union
Purpose: Combine datasets vertically (stack rows) when they have the same structure (e.g., monthly sales
data in separate sheets).
Use Case: Appending data from similar sources (e.g., Sales_Jan, Sales_Feb).
Steps in Tableau:
Drag the first table to the canvas.
Click the New Union button (double-table icon).
Add additional tables/sheets to the union.
Note: Columns must align by name/position. Use wildcard unions to automate merging similar files (e.g.,
all .csv files in a folder).
Join
Purpose: Combine datasets horizontally (merge columns) using a common key (e.g., Order_ID).
Types:
Inner Join: Returns matching rows from both tables.
Left/Right Join: Returns all rows from one table and matches from the other.
Full Outer Join: Returns all rows from both tables.
Steps in Tableau:
Drag the primary table to the canvas.
Drag a secondary table and drop it on the primary table.
Define the join clause (e.g., Orders.Order_ID = Returns.Order_ID).
Best Practice: Use joins for structured relational data (e.g., SQL tables).
2. Cross-Database Joins
Purpose: Join tables from different databases (e.g., Excel + SQL Server).
Use Case: Combining CRM data (Salesforce) with transactional data (MySQL).
Steps:
Connect to the first data source (e.g., Excel).
Add a second connection (e.g., MySQL) via the New Data Source button.
Drag both tables to the canvas and define the join relationship.
Limitations:
Requires compatible data types.
Performance may degrade with large datasets (use Extracts to optimize).
3. Data Blending
Purpose: Integrate data from different sources (e.g., Excel + Google Sheets) without physically joining
them.
How It Works:
Blending aggregates data at the visualization level using a common dimension (e.g., Region).
Left Data Source: Primary data (defines the view).
Secondary Data Source: Linked via a shared field (displayed as an orange link icon).
Steps:
Connect to the primary data source (e.g., Sales data in Excel).
Connect to the secondary source (e.g., Marketing budget in Google Sheets).
Drag a field from the secondary source to the view (Tableau will auto-blend using common dimensions).
Use Case: Combining sales data (SQL) with budget data (Excel) by Region.
Limitations:
Blending works at the aggregate level (not row-level joins).
Secondary data sources are filtered based on the primary source.
4. Key Differences
Method Use Case Data Structure Performance
Append rows with identical
Union Vertical stacking Fast
columns
Merge columns via a shared
Join Horizontal merge Depends on data size
key
Cross-Database Combine data from different Slower (optimize with
Horizontal merge
Join DBs/Sources extracts)
Aggregate data from Linked by a common Slower for large
Data Blending
unrelated sources dimension datasets
Experiment 8
Basic functionalities
Filtering
Sorting
Grouping
Hierarchies
Creating sets
Types of dates – Continuous vs. Discreet
Pivot tables
Objective: To explore and apply basic functionalities in Tableau such as filtering, sorting, grouping,
hierarchies, creating sets, working with date types, and pivot tables for effective data analysis.
Procedure:
i) Handling Missing Data in Tableau
Tableau does not have built-in imputation tools like Python, but you can handle missing values using the
following methods:
1. Filter Out Missing Values:
Steps:
1. Drag the field with missing values (e.g., Age) to the Filters shelf.
2. In the filter dialog, uncheck Null or Null (missing).
3. Click OK to exclude rows with missing values.
2. Replace Nulls with a Default Value:
Create a Calculated Field:
IF ISNULL([Age]) THEN AVG([Age]) ELSE [Age] END
o Replace AVG([Age]) with MEDIAN([Age]) or a fixed value (e.g., 0).
3. Use Tableau Prep (Advanced):
Use Tableau Prep Builder to clean data before importing it into Tableau Desktop:
1. Add a Clean Step to replace nulls in numerical columns with Mean or Median.
2. For categorical columns, replace nulls with a placeholder (e.g., Unknown).
ii) Min-Max Normalization in Tableau
Tableau does not have a built-in normalization function, but you can create normalized values using
calculations:
1. Manual Calculation:
Formula:
([Value] - {FIXED : MIN([Value])})
/
({FIXED : MAX([Value])} - {FIXED : MIN([Value])})
Example (for Income):
([Income] - {FIXED : MIN([Income])})
/
({FIXED : MAX([Income])} - {FIXED : MIN([Income])})
This scales values between [0, 1].
2. Steps:
1. Create a Calculated Field named Income (Normalized) using the formula above.
2. Drag the normalized field to your visualization (e.g., axes, tooltips).
3. Alternative:
Use Tableau Prep to normalize data before analysis:
1. Add a Clean Step.
2. Use the formula above or a Range scaling option in Tableau Prep.
Example Output in Tableau
Before Preprocessing:
Custome Incom
Age Gender
r e
A 25 50,000 Male
B — 60,000 Female
C 30 — —
After Handling Missing Data:
Custome Income
Age (Imputed) Gender (Imputed)
r (Imputed)
A 25 50,000 Male
B 27.5 (Mean) 60,000 Female
C 30 55,000 (Mean) Unknown
After Min-Max Normalization:
Age
Customer Income (Normalized)
(Normalized)
A 0.0 0.0
B 0.2 0.33
C 0.5 0.5
Experiment 9
Dashboards and stories
Building dashboards
Dashboard objects
Dashboard formatting
Dashboard extensions Story points
Objective: To build interactive dashboards and stories using Tableau, incorporating visual elements,
dashboard objects, formatting, extensions, and story points for better data storytelling.
Procedure:
1. Building Dashboards
A dashboard is a collection of worksheets, filters, legends, and other objects arranged in a single view to
provide a holistic data perspective.
Steps to Build a Dashboard:
Create Worksheets: Build individual charts/sheets (e.g., bar charts, maps, tables) for your analysis.
Open the Dashboard Tab: Click the New Dashboard icon at the bottom of the workbook.
Add Sheets:
Drag and drop worksheets from the Dashboard Pane onto the canvas.
Arrange them using layout containers (horizontal/vertical) for alignment.
Add Objects:
Use text boxes, images, web pages, or blank spaces for annotations or branding.
Add Filters/Parameters:
Drag filters from the worksheet or create parameters for interactivity.
2. Dashboard Objects
Objects are building blocks of a dashboard. Key objects include:
Worksheets: Charts or tables from your analysis.
Filters: Interactive controls to slice data (e.g., date ranges, categories).
Parameters: Dynamic inputs for calculations (e.g., switching metrics).
Text Boxes: Add titles, descriptions, or annotations.
Images/Logos: Insert company logos or visual aids.
Web Pages: Embed live web content (e.g., a live Twitter feed).
Extensions: Add third-party tools (e.g., statistical models, maps).
3. Dashboard Formatting
Formatting ensures your dashboard is visually appealing and user-friendly.
Key Formatting Tools:
Layout Pane: Adjust padding, borders, and spacing between objects.
Themes: Apply predefined color/font themes (Format > Workbook Theme).
Fonts/Colors: Customize text, headers, and background colors.
Tooltip Formatting: Edit tooltips to show/hide details on hover.
Sheet Order: Control which sheets appear on top (Dashboard > Arrange).
4. Dashboard Extensions
Extensions enhance dashboards with external tools or custom code.
Examples:
Analytics Pane: Add trend lines, forecasts, or clustering.
Third-Party Extensions:
Maps+: Enhanced mapping tools.
Stats Tools: Statistical analysis (e.g., regression).
Google Sheets Sync: Live data integration.
5. Story Points
A Story is a sequence of dashboards or sheets that guide users through a data-driven narrative.
Steps to Build a Story:
Click the New Story tab at the bottom.
Add Sheets/Dashboards: Drag a worksheet or dashboard to the story pane.
Add Captions: Use text boxes to explain each story point (e.g., insights, context).
Format the Story:
Adjust layout and fonts to match dashboards.
Add navigation buttons (Back/Next) for guided flow.
Publish: Share stories via Tableau Server/Public or export as PDF.
Example OUTPUT: Sales Performance Story
Story Point Dashboard/Sheet Caption
1. Overview Sales by Region "Q3 Sales grew 15% YoY, led by the West Region."
2. Drill- Product Category "Electronics drove 60% of revenue, but apparel sales
Down Analysis declined."
3. Forecast Trend Projection "Holiday sales are projected to exceed targets by 20%."
Experiment 10
Calculations
Syntax
Table calculations
LOD expressions
Aggregate Date, Logic, String, Number, Type calculations
Objective: To perform various types of calculations in Tableau including basic syntax, table
calculations, Level of Detail (LOD) expressions, and aggregate operations for different data types.
Procedure:
1. Calculation Syntax
Tableau uses a formula language similar to Excel or SQL. Key components include:
Basic Syntax Rules:
Fields: Use square brackets for field names: [Sales], [Region].
// Calculate profit margin
[Profit] / [Sales]
// Conditional logic for performance categories
IF [Sales] > 10000 THEN "High" ELSE "Low" END
// String concatenation
[Region] + " - " + STR([Sales])
2. Table Calculations
Table calculations are performed on the aggregated results in your view (e.g., running totals, percent of
total).
Common Table Calculation Functions:
Rank: RANK()
Running Total: RUNNING_SUM()
Percent of Total: PERCENT_OF_TOTAL()
Window Average: WINDOW_AVG()
Difference: DIFF()
// Running total of sales across months
RUNNING_SUM(SUM([Sales]))
// Percent of total sales per region
SUM([Sales]) / TOTAL(SUM([Sales]))
Example Output:
Month Sales Running Total
Jan 1000 1000
Feb 1500 2500
Mar 2000 4500
3. LOD (Level of Detail) Expressions
LOD calculations let you define the granularity (detail level) of aggregations independently of the
visualization.
Types of LODs:
FIXED:
Compute at a specified dimension level, ignoring the view’s granularity.
{ FIXED [Region] : SUM([Sales]) } // Total sales per region
INCLUDE:
Compute at a finer granularity than the view.
{ INCLUDE [Customer] : AVG([Sales]) } // Avg sales per customer
EXCLUDE:
Compute at a coarser granularity than the view.
{ EXCLUDE [Product] : SUM([Sales]) } // Total sales ignoring product
1. Date Calculations
Perform operations on date fields, such as extracting parts of dates, aggregating, or calculating intervals.
Common Functions:
DATEPART(): Extract parts of a date (e.g., year, month, day).
DATEPART('month', [Order Date]) // Returns 1 for January, 2 for February, etc.
DATEDIFF(): Calculate the difference between two dates.
DATEDIFF('day', [Order Date], [Ship Date]) // Days between order and shipment.
DATEADD(): Add/subtract time intervals.
DATEADD('month', 3, [Order Date]) // Adds 3 months to the order date.
DATETRUNC(): Truncate dates to a specific granularity.
DATETRUNC('quarter', [Order Date]) // Rounds dates to the start of the quarter.
Aggregation Example:
// Total sales by month:
DATETRUNC('month', [Order Date]) // Group by month
SUM([Sales])
2. Logic (Conditional) Calculations
Use IF, CASE, or Boolean logic to create conditional outputs.
Examples:
Simple IF-THEN-ELSE:
IF [Sales] > 10000 THEN "High" ELSE "Low" END
CASE Statement (for multiple conditions):
CASE [Region]
WHEN "West" THEN 1
WHEN "East" THEN 2
ELSE 3
END
3. String Calculations
Manipulate text fields using string functions.
Common Functions:
Concatenation:
[First Name] + " " + [Last Name] // Combines names into "John Doe"
Substrings:
LEFT([Product Code], 3) // First 3 characters of the code.
MID([Address], 5, 10) // Extract characters starting at position 5.
4. Number Calculations
Perform arithmetic operations or aggregations on numeric fields.
Examples:
Basic Arithmetic:
[Profit] / [Sales] // Profit margin.
Aggregation:
SUM([Sales]) - SUM([Cost]) // Total profit.
Rounding:
ROUND([Profit], 2) // Rounds to 2 decimal places.
Percentage of Total:
SUM([Sales]) / TOTAL(SUM([Sales])) // Percent contribution to total sales.
5. Type Conversion Calculations
Convert data types (e.g., string to number, date to string).
Common Functions:
Convert to String:
STR([Sales]) // Converts 1000 to "1000".
Convert to Number:
FLOAT([Price String]) // Converts "$10.5" to 10.5 (requires cleanup first).
INT([Quantity]) // Converts "5" to 5.
Convert to Date:
DATE([Order Date String]) // Converts "2023-01-01" to a date type.
Example Workflow
Goal: Calculate the average sales per customer, excluding canceled orders.
// Step 1: Filter non-canceled orders
IF [Order Status] = "Canceled" THEN 0 ELSE [Sales] END
// Step 2: Aggregate sales by customer
{ FIXED [Customer ID] : SUM([Sales]) }
// Step 3: Convert to string for labeling
"Customer: " + [Customer ID] + " | Avg Sales: " + STR(AVG([Sales]))
Experiment 11
Built-in chart types/visualisations:
Line chart
Dot chart
Bar chart
Other types of visualisation (bullet graph, Heat map, Tree map, etc.).
Combo charts – dual axis
Objective: To explore various built-in chart types and visualizations in Tableau such as line, dot, bar,
bullet graphs, heat maps, tree maps, and dual-axis combo charts.
Procedure:
1. Line Chart
Purpose: Show trends over time or continuous categories.
How to Build:
Drag a date or continuous dimension to Columns.
Drag a measure (e.g., Sales) to Rows.
Change the mark type to Line.
2. Dot Chart (Scatter Plot)
Purpose: Compare relationships between two measures.
How to Build:
Drag a measure to Columns (e.g., Sales).
Drag another measure to Rows (e.g., Profit).
Add a dimension to the Detail mark for color/size encoding.
Change the mark type to Circle.
3. Bar Chart
Purpose: Compare values across discrete categories.
Types:
Horizontal: Drag dimension to Rows, measure to Columns.
Stacked: Add a secondary dimension to the Color mark.
How to Build:
Drag a dimension (e.g., Category) to Columns.
Drag a measure (e.g., Sales) to Rows.
Change the mark type to Bar.
4. Bullet Graph
Purpose: Compare a measure to a target or benchmark.
How to Build:
Create a bar chart for the primary measure (e.g., Sales).
Add a reference line (e.g., Target Sales) via Analytics Pane.
Add color bands (e.g., Poor/Good/Great) using reference distributions.
5. Heat Map
Purpose: Visualize density or relationships between two dimensions using color.
How to Build:
Drag two dimensions to Rows and Columns (e.g., Region, Category).
Drag a measure to the Color mark (e.g., SUM(Sales)).
Change the mark type to Square.
6. Tree Map
Purpose: Show hierarchical data as nested rectangles (size/color-encoded).
How to Build:
Drag a hierarchy (e.g., Category > Sub-Category) to Label.
Drag a measure to Size (e.g., Sales).
Drag another measure to Color (e.g., Profit).
Change the mark type to Tree Map.
7. Combo Chart (Dual Axis)
Purpose: Layer two measures with different scales (e.g., Sales vs. Profit Margin).
How to Build:
Create a bar chart (e.g., SUM(Sales) by Month).
Drag a second measure (e.g., AVG(Profit Margin)) to the opposite axis (right-click > Dual Axis).
Synchronize axes if needed (right-click axis > Synchronize Axis).
8. Other Key Visualizations
Chart Type Purpose Steps
Show cumulative trends over
Area Chart Use a line chart > Change mark type to Area.
time.
Drag a measure to Rows > Analytics Pane >
Box Plot Display data distribution.
Add Box Plot.
Track timelines/project Use a date field and duration > Change mark type
Gantt Chart
schedules. to Gantt.
Combine a bar chart (descending order) + cumulative
Pareto Chart Highlight the 80/20 rule.
percentage line.
Waterfall Use a bar chart with Gantt bars and running total
Show incremental changes.
Chart calculations.
Experiment 12
Custom chart types:
KPI matrix
Waterfall
Gantt
Dot plot
Pareto
Analytics’ options: trend lines, forecasting, clustering
Objective: To create custom chart types in Tableau such as KPI Matrix, Waterfall, Gantt, Dot Plot,
Pareto chart, and apply advanced analytics like trend lines, forecasting, and clustering.
1. KPI Matrix
Purpose: Display key performance indicators (KPIs) in a grid format.
Example: Sales vs. Target by Region.
Steps:
Drag a dimension (e.g., Region) to Rows.
Drag measures (e.g., Sales, Profit, Target) to Columns.
Use Text Marks:
Drag measures to the Text shelf.
Format numbers (e.g., currency, percentages).
Add Conditional Formatting:
Right-click a measure > Apply Color Gradient (e.g., red for missed targets, green for achieved).
Tip: Use Worksheet Titles to label KPIs dynamically (e.g., SUM(Sales) vs. AVG(Target)).
2. Waterfall Chart
Purpose: Visualize cumulative changes (e.g., profit over time).
Steps:
Create a Bar Chart with a dimension (e.g., Month) and measure (e.g., Profit).
Add a Running Total Calculation:
Right-click the measure > Quick Table Calculation > Running Total.
Add Reference Lines:
Drag the measure to Detail and use GANTT Bars (Mark Type = Gantt).
Adjust bar colors (positive=green, negative=red).
Example:
3. Gantt Chart
Purpose: Track project timelines or durations.
Steps:
Drag Start Date to Columns.
Drag Duration (End Date - Start Date) to Rows.
Change Mark Type to Gantt.
Customize colors (e.g., by Project Phase).
Tip: Use Tooltips to show details like % Complete or Owner.
4. Dot Plot
Purpose: Compare categories using dots on an axis.
Steps:
Drag a dimension (e.g., Product) to Rows.
Drag a measure (e.g., Sales) to Columns.
Change Mark Type to Circle.
Add a Reference Line (e.g., Average Sales) for comparison.
Advanced: Use Jittering to avoid overlapping dots:
Create a calculated field: INDEX() % 10 and add it to Columns.
5. Pareto Chart
Purpose: Highlight the "80/20 rule" (e.g., top 20% products driving 80% of sales).
Steps:
Sort data descending by a measure (e.g., Sales).
Create a Bar Chart for the measure.
Add a Cumulative Percentage Line:
Right-click the axis > Add Reference Line > Distribution > Cumulative Sum.
Dual-Axis:
Drag the cumulative measure to the opposite axis.
Synchronize axes (right-click axis > Synchronize Axis).
Example:
6. Analytics Options
Trend Lines
How to Add:
Drag a trend line from the Analytics pane to the chart.
Choose a model (Linear, Logarithmic, Exponential).
Use Case: Identify sales trends over time.
Forecasting
How to Add:
Right-click a time-based chart > Forecast > Show Forecast.
Adjust settings (e.g., forecast length, seasonality).
Use Case: Predict future sales.
Clustering
How to Add:
Drag Clusters from the Analytics pane to the view.
Select variables (e.g., Sales, Profit) and number of clusters.
Use Case: Segment customers into groups based on behavior.
Summary
KPI Matrix: Use text tables with conditional formatting.
Waterfall: Running total + Gantt bars.
Gantt: Date fields + duration calculation.
Dot Plot: Circles + jittering.
Pareto: Sorted bars + cumulative line.
Analytics: Trend lines, forecasting, and clustering are built-in tools under the Analytics pane.
Experiment 13
CREATE AND FORMAT REPORTS USING THE TABLEAU DESKTOP
Describe the use of Page Backgrounds and Templates
Create visualizations to display the data
Apply drill through and drill down
Create and manage slicers with the use of filters.
Explore visual interactions
Review Bookmarks
Publish the report to the Tableau online
Objective: To create and format reports using Tableau Desktop by exploring templates, applying drill-
through and drill-down features, using slicers and filters, managing interactions, bookmarks, and
publishing to Tableau Online.
Step 1: Connect to Data Source
Open Tableau Desktop and connect to your dataset (Excel, CSV, SQL, etc.).
Clean and prepare the data (e.g., rename fields, set data types).
Step 2: Create Visualizations
Example Visualizations (adjust based on your data):
Bar Chart: Sales by Region.
Line Chart: Monthly Sales Trends.
Map: Geographic distribution of customers.
Pie Chart: Product Category Distribution.
Scatter Plot: Profit vs. Quantity.
How to Create:
Drag dimensions (e.g., Region, Month) to Columns/Rows.
Drag measures (e.g., Sales, Profit) to Rows/Columns or Marks.
Use the Show Me panel to switch chart types.
Step 3: Apply Drill Down/Drill Through
Drill Down (Hierarchy):
Create a hierarchy (e.g., Country > State > City).
Right-click a data point (e.g., Country bar) and select Drill Down to State/City.
Drill Through:
Create a new sheet with detailed data.
Right-click a data point in the main visualization > Add Drill Through.
Link to the detailed sheet.
Step 4: Add Filters and Slicers
Filters:
Drag a field (e.g., Region, Product) to the Filters shelf.
Right-click the filter > Show Filter to display it on the dashboard.
Slicers:
Use a discrete field (e.g., Year) as a filter.
Format the slicer (e.g., dropdown, single-value list, or slider).
Step 5: Design Page Backgrounds
Background Image:
Go to Format > Workbook Theme to set a custom background.
Use Dashboard > Format to adjust background color/image for specific sheets.
Templates:
Save a dashboard layout as a template: File > Save As Template.
Reuse it for future reports with similar structures.
Step 6: Set Up Visual Interactions
Highlight Actions:
Go to Dashboard > Actions.
Add a Highlight Action to link visuals (e.g., selecting a region highlights sales in other charts).
Filter Actions:
Add a Filter Action to sync filters across multiple sheets (e.g., clicking a bar updates all visuals).
Step 7: Use Bookmarks
Create Bookmarks:
Adjust filters/views to a specific state (e.g., 2023 Sales in the Northeast).
Go to Window > Bookmarks > Create Bookmark.
Manage Bookmarks:
Use bookmarks to toggle between pre-saved views during presentations.
Step 8: Build the Dashboard
Create a New Dashboard:
Drag sheets from the Sheets pane onto the dashboard.
Arrange visuals using containers (horizontal/vertical) for alignment.
Add Filters/Slicers:
Drag filters from the right pane to the dashboard.
Group them in a floating container for a clean look.
Add Titles/Text:
Use Text objects to label sections or add descriptions.
Step 9: Publish to Tableau Online
Sign In:
Go to Server > Sign In and log in to your Tableau Online account.
Publish:
Click Server > Publish Workbook.
Choose a project/folder and set permissions (e.g., viewer/editor access).
Share:
Copy the link or embed the dashboard in a website.