Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
29 views65 pages

DA Lab Manual (ACSDS0653) (1) Copy 1

The document is a lab file for the Data Analytics course at the Noida Institute of Engineering and Technology for the 2024-2025 session. It includes an index of experiments covering installation of MySQL, Anaconda, and Tableau, as well as data import/export operations, data pre-processing, and visualization techniques. Each section outlines objectives, methodologies, and steps for performing various data analytics tasks using programming languages and tools.

Uploaded by

Saransh Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
29 views65 pages

DA Lab Manual (ACSDS0653) (1) Copy 1

The document is a lab file for the Data Analytics course at the Noida Institute of Engineering and Technology for the 2024-2025 session. It includes an index of experiments covering installation of MySQL, Anaconda, and Tableau, as well as data import/export operations, data pre-processing, and visualization techniques. Each section outlines objectives, methodologies, and steps for performing various data analytics tasks using programming languages and tools.

Uploaded by

Saransh Mishra
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 65

NOIDA INSTITUTE OF ENGINEERING AND

TECHNOLOGY
GREATER NOIDA-201306
(An Autonomous Institute)
School of Computer Sciences & Information Technology

Department of IT
Session (2024 – 2025)

LAB FILE
OF
DATA ANALYTICS
(ACSDS0653)
(6th Semester)

Submitted to: Submitted by:


Ms. Nidhi Chauhan Name:
Roll. No:

Affiliated to Dr. A.P.J Abdul Kalam Technical University, Uttar Pradesh, Lucknow.
INDEX
INDEX
S.No Name of Experiment Date Faculty
Signature
1
Installation of MySQL, Anaconda, and Tableau
To perform data import/export (.CSV, .XLS, .TXT)
2
operations using data frames in R/Python.
To perform data pre-processing operations i) Handling
3
Missing data ii) Min-Max normalization
To perform dimensionality reduction operation using PCA
4
Houses Data Set
To perform statistical operations (Mean, Median, Mode and
5
Standard deviation) using
Tableau – getting started
 User interface
 Methodology for working with the interface
 Connecting to different types of data sources (Excel, csv,
6
Access, MySQL, Tableau Server)
 Editing Data Connections and Data Sources; Live mode vs.
Extract mode
 Date interpreter / Pivot
Joining multiple datasets
 Union / Join
7
 Cross database joins
 Data Blending – integrating different data source
Basic functionalities
 Filtering
 Sorting
 Grouping
8
 Hierarchies
 Creating sets
 Types of dates – Continuous vs. Discreet
 Pivot tables
Dashboards and stories
 Building dashboards
9  Dashboard objects
 Dashboard formatting
 Dashboard extensions Story points
Calculations
 Syntax
10  Table calculations
 LOD expressions
 Aggregate Date, Logic, String, Number, Type calculation
11 Built-in chart types/visualisations:
 Line chart
 Dot chart
 Bar chart
 Other types of visualisation (bullet graph, Heat map, Tree
map, etc.).
 Combo charts – dual axis

Custom chart types:


 KPI matrix
 Waterfall
 Gantt
12
 Dot plot
 Pareto
 Analytics’ options: trend lines, forecasting, clustering

CREATE AND FORMAT REPORTS USING THE


TABLEAU DESKTOP
 Describe the use of Page Backgrounds and Templates
 Create visualizations to display the data
13  Apply drill through and drill down
 Create and manage slicers with the use of filters.
 Explore visual interactions
 Review Bookmarks
 Publish the report to the Tableau online

Experiment 1
Objective: Installation of MySQL, Anaconda, Tableau

MySQL

MySQL is a relational database management system (RDBMS) that stores and manages data using SQL. It's
open source, meaning anyone can use and modify it.

MySQL is a popular open-source Relational Database Management System (RDBMS) that


uses SQL (Structured Query Language) for database operations. While MySQL is a specific database
system accessible for free and supports various programming languages.

Why Use MySQL


MySQL is a popular choice for managing relational databases for several reasons:
1. Open Source: MySQL is open-source software, which means it’s free to use and has a large
community of developers contributing to its improvement.
2. Relational: MySQL follows the relational database model, allowing users to organize data
into tables with rows and columns, facilitating efficient data storage and retrieval.
3. Reliability: MySQL has been around for a long time and is known for its stability and reliability.
4. Performance: MySQL is optimized for performance, making it capable of handling high-volume
transactions and large datasets efficiently.
5. Scalability: MySQL can scale both vertically and horizontally to accommodate growing data and
user loads. You can add more resources to a single server or distribute the workload across multiple
servers using techniques like sharding or replication.
6. Compatibility: MySQL is widely supported by many programming languages, frameworks, and
tools. It offers connectors and APIs for popular languages like PHP, Python, Java, and more, making it
easy to integrate with your existing software stack.
7. Security: MySQL provides robust security features to protect your data, including access controls,
encryption, and auditing capabilities. With proper configuration, you can ensure that only authorized
users have access to sensitive information.

Hardware and Software Requirements to Install MySQL


Before installing MySQL to your PC, ensure your system has a capable processor (like Intel Core), a
minimum of 4 GB RAM (or 6 GB), a compatible graphics card, and a display with at least 1024×768
resolution.

Download and Install MySQL for Windows Steps


Now, Let’ ‘s break down MySQL software downloading steps for a better understanding and see install
MySQL on Windows 10 step by step.

Step 1: Visit the Official MySQL Website


Open your preferred web browser and navigate to the official MySQL website . Now, Simple click on first
download button.

Step 2: Go to the Downloads Section

On the MySQL homepage, Click on the ” No thanks, just start my download” link to proceed MySql
downloading.

Step 3: Run the Installer


After MySQL downloading MySQL.exe file , go to your Downloads folder, find the file, and double-click
to run the installer.

Step 4: Choose Setup Type

The installer will instruct you to choose the setup type. For most users, the “ Developer Default” is
suitable. Click “Next” to proceed.

Step 5: Check Requirements

You might be prompted to install necessary MySQL software, typically Visual Code. The installer can
auto-resolve some issues, but not in this case.
Step 6: MySQL Downloading

Now that you’re in the download section, click “Execute” to start downloading the components you
selected. Wait a few minutes until all items show tick marks, indicating completion, before moving
forward.
Step 7: MySqL Installation

Now the downloaded components will be installed. Click “Execute” to start the installation process.
MySQL will be installed on your Windows system. Then click Next to proceed

Step 8: Navigate to Few Configuration Pages

Proceed to “Product Configuration” > “Type and Networking” > “Authentication Method” Pages by
clicking the “Next” button.
Step 9: Create MySQL Accounts

Create a password for the MySQL root user. Ensure it’s strong and memorable. Click “ Next” to proceed.

Step 10: Connect To Server


Enter the root password, click Check. If it says “Connection succeed,” you’ve successfully connected to
the server.

Step 11: Complete Installation

Once the installation is complete, click “Finish.” Congratulations! MySQL is now installed on your
Windows system.

Step 12: Verify Installation


To ensure a successful installation of MySQL, open the MySQL Command Line Client or MySQL
Workbench, both available in your Start Menu. Log in using the root user credentials you set during
installation.
MySQL Workbench Is Ready To Use
MySQL is an open-source relational database management system that is based on SQL queries. MySQL
is used for data operations like querying, filtering, sorting, grouping, modifying, and joining the tables
present in the database.
Read more: MySQL and its working

Anaconda
Anaconda is an open-source distribution of the Python and R programming languages. It's used for data
science, machine learning, and artificial intelligence (AI).

Download the Anaconda Installer – 3 Steps


We will start the process by downloading the Anaconda Installer by following these three simple steps as
mentioned below:
Step 1: Visit the Official Website

Head over to anaconda.com and install the latest version of Anaconda. Make sure to download the “Python
3.13.1 Version” for the appropriate architecture.

Step 2: Select the Windows Installer

Choose the appropriate installer (based on your system’s architecture) i.e. 64-bit (for modern systems) or
32-bit (for older systems).

Step 3: Start the Downloading Process

Select the location where you want to save the file and click “Save” to start the downloading process.
Run the Anaconda Installer – 8 Steps
Once the installation is completed, now we will see how to setup Ananconda Installer in
your Windows PC. Let’s check it out:

Step 1: Begin with the installation process


Navigate to the downloaded file, make a double-click on the .exe file and start the installation process.

Step 2: Getting through the License Agreement

Follow the on-screen instructions, read the license terms & agreement and proceed ahead.

Terms & License

Step 3: Select Installation Type


Select Just Me if you want the software to be used by a single User else you can select All Users (for all

users on the system)


Step 4: Choose Installation Location

Select the path where you wish to install the file extractor and click “Next” to proceed ahead.

Anaconda Setup .exe

Step 5: Advanced Installation Option


Choose whether to add Anaconda to your system PATH environment variable. Also note that adding
Anaconda to the PATH can interfere with other software and decide whether to register Anaconda as the
default Python.

Advanced Installation Options – Anaconda

Step 6: Getting through the Installation Process

Click Install to start the Anaconda Installation process.

Step 7: Recommendation to Install Pycharm


During the installation, it will ask you to visit the official site to get PyCharm in your Windows PC. You
can choose to visit or may skip as well.

PyCharm recommended

Step 8: Finishing up the Installation

Once the installation gets complete, click Finish to complete the process.

Verify the Installation


Once the installation gets completed, you may verify the installation by following these steps.
Step 1: Access to Anaconda Prompt

Click on the Start Menu and search for “Anaconda Prompt” and click to open it.
Anaconda Prompt
Step 2: Run the Program to check for the Anaconda Version

Type the following command to check for the installed version of conda:
conda --version

Working with Anaconda


Once the downloading, installing and verification of the Anaconda for Windows PC is competed, you
can start exploring and get familiar with its features.
Step 1: Access Anaconda Navigator

Once the installation process is done, Anaconda can be used to perform multiple operations. To begin
using Anaconda, search for Anaconda Navigator from the Start Menu in Windows PC.

Step 2: Explore Navigator & Features

You can use navigator to create new environments , install packages, and launch applications without
using the command line.
Tableau

Tableau is a visual analytics platform that helps people and organizations use data to solve problems. It's a
tool that allows users to access, analyze, and visualize data.

Download and Installation of Tableau


Tableau is available in two ways:-

o Tableau Public (Free)


o Tableau Desktop (Commercial)
Here is a comparison between the Tableau Public and Tableau Desktop
Tableau Public

o Tableau Public is a free and open-source.


o Tableau public data source can connect to Excel and Text files.
o Tableau public can be installed on Window and Mac operating system.
o Data and Visualizations are not secured in the Tableau public because it is available in public.
o In Tableau public, data cannot be obtained from different data sources as it is limited to connect only
Excel and Text files.
o Tableau public uses the details at Personal level.
Tableau Desktop

o Tableau Desktop is a paid source, personal edition- $35 per month and professional edition- $70 per
month.
o Tableau desktop data source can connect to any data source file, including databases, web
applications, and more.
o Tableau desktop can also install on Window and Mac operating system.
o Data and Visualization are secured in Tableau desktop.
o In Tableau desktop, data can extract from various data sources and stored as Tableau extract file.
o Tableau desktop uses the details at Professional and Enterprise level.
Lets install the Tableau Desktop on Window machine and go through step by step:-

Step1:- Go to https://www.tableau.com/products/desktop on your Web browser.

Backward Skip 10sPlay VideoForward Skip 10s

Step2:- Click on the 'Try Now' button.

Step3:- Now, enter your Email id and click on the 'Download Free Trial' button.
Step4:- This will start downloading the .exe File for window machine by default.

Step5:- Open the download file, and click on the 'Run' button.

Step6:- Accept the terms and condition and click on 'Install' button.
Step7:- A pop message will be shown on the screen to get the approval of the administrator to install the
Tableau software. Click on 'yes' to approve it than installation will be started.

Step8:- Once the installation is completed, then open the Tableau desktop software.

Step9:- In the registration window


1. Click on Activate Tableau and fill your complete details.
2. Click on start trial now.

Step10:- Wait for complete registration.

Step11:- Start screen


of the Tableau Desktop.
Experiment 2

Objective: To perform data import/export (.CSV, .XLS, .TXT) operations using data frames in
R/Python.

1. Importing Data
import pandas as pd

# CSV File
df_csv = pd.read_csv("data.csv")

# Excel File (.xls or .xlsx)


df_excel = pd.read_excel("data.xlsx")

# TXT File (tab or custom-delimited)


df_txt = pd.read_csv("data.txt", delimiter="\t") # or use sep=',' for comma
2. Exporting Data
# To CSV
df_csv.to_csv("output.csv", index=False)

# To Excel
df_csv.to_excel("output.xlsx", index=False)

# To TXT
df_csv.to_csv("output.txt", sep='\t', index=False)
OUTPUT:-
Experiment 3
Objective: To perform data pre-processing operations i) Handling Missing data ii) Min-Max
normalization

1.Handling Missing Data


Identify Missing Values: Detect columns with missing values.

Choose Imputation Strategy:

Numerical Data: Replace missing values with the mean (for normal distributions) or median (for skewed
distributions).

Categorical Data: Replace missing values with the mode (most frequent category).

Impute Missing Values:

Use pandas for basic imputation or scikit-learn for pipeline integration.

import pandas as pd
from sklearn.impute import SimpleImputer

# Load dataset
df = pd.read_csv('data.csv')

# Separate numerical and categorical columns (example)


num_cols = df.select_dtypes(include=['int64', 'float64']).columns
cat_cols = df.select_dtypes(include=['object']).columns

# Impute numerical columns with mean


num_imputer = SimpleImputer(strategy='mean')
df[num_cols] = num_imputer.fit_transform(df[num_cols])

# Impute categorical columns with mode


cat_imputer = SimpleImputer(strategy='most_frequent')
df[cat_cols] = cat_imputer.fit_transform(df[cat_cols])

# Alternatively, using pandas:


# df[num_cols] = df[num_cols].fillna(df[num_cols].mean())
# df[cat_cols] = df[cat_cols].fillna(df[cat_cols].mode().iloc[0])
Original Dataset (Before Preprocessing)

Age Income Gender

25 50000 Male

NaN 60000 Female

30 NaN NaN

35 70000 Male

NaN 80000 Female

After Imputation:

Ag
Income Gender
e

25 50000.0 Male

30 60000.0 Female

30 65000.0 Male

35 70000.0 Male

30 80000.0 Female

Age: Missing values replaced with the mean (30).

Income: Missing value replaced with the mean (65,000).

Gender: Missing value replaced with the mode (Male, since it appeared most frequently).
ii) Min-Max Normalization

Purpose: Scale numerical features to a range of [0, 1].

Formula:

Implementation:

Use scikit-learn for efficient scaling or compute manually with pandas.

from sklearn.preprocessing import MinMaxScaler

# Initialize scaler

scaler = MinMaxScaler()

# Apply normalization to numerical columns

df[num_cols] = scaler.fit_transform(df[num_cols])

# Manual implementation with pandas:

# df[num_cols] = (df[num_cols] - df[num_cols].min()) / (df[num_cols].max() - df[num_cols].min())

Step 2: Min-Max Normalization

After Scaling:

Age (Normalized) Income (Normalized) Gender

0.00 0.00 Male

0.50 0.33 Female


Age (Normalized) Income (Normalized) Gender

0.50 0.50 Male

1.00 0.67 Male

0.50 1.00 Female

Key Observations:

Missing Data Handling:

Numerical columns (Age, Income) used mean imputation.

Categorical column (Gender) used mode imputation.

Normalization:

Values scaled between [0, 1] (e.g., Age 25 → 0.0, 35 → 1.0).

Categorical Data:

Gender remains unchanged (normalization only applies to numerical features).


Experiment 4:

Objective: To perform dimensionality reduction operation using PCA Houses Data Set

To apply Principal Component Analysis (PCA) for dimensionality reduction on the Houses dataset to
reduce the number of features while preserving maximum variance.

Procedure:

Import required libraries: pandas, numpy, sklearn.decomposition, and matplotlib for visualization.
Load the Houses dataset using pandas.read_csv().
Perform exploratory data analysis:
Check for null values.
Drop or fill missing values as needed.
Normalize the data using StandardScaler.

Apply PCA:

Use sklearn.decomposition.PCA to reduce dimensionality.


Fit PCA on the normalized data and transform it.
Analyze explained variance ratio.
Visualize the results using a 2D or 3D scatter plot of the transformed dataset.
Compare original features vs reduced features to analyze performance.

Sample Code:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler
# Load dataset
data = pd.read_csv(r"C:\Users\lenovo\Downloads\houses.csv")

# Display the first few rows


print(data.head())
# Apply PCA to reduce to 2 principal components
pca = PCA(n_components=2)
principal_components = pca.fit_transform(scaled_data)

# Convert to DataFrame
pca_df = pd.DataFrame(data=principal_components, columns=['PC1', 'PC2'])
# Display the transformed data
pca_df.head()

# Scatter plot for PCA visualization


plt.figure(figsize=(8, 6))
plt.scatter(pca_df['PC1'], pca_df['PC2'], alpha=0.6, color='blue')
plt.xlabel("Principal Component 1")
plt.ylabel("Principal Component 2")
plt.title("PCA Projection of Housing Data")
plt.grid(True)
plt.show()

explained_variance = pca.explained_variance_ratio_
print("Explained Variance Ratio:", explained_variance)

# Plot variance explained by each principal component


plt.figure(figsize=(6, 4))
plt.bar(range(1, 3), explained_variance, alpha=0.7, color="red")
plt.xlabel("Principal Components")
plt.ylabel("Variance Explained")
plt.title("PCA Explained Variance")
plt.show()
Experiment 5:

Objective: To perform statistical operations (Mean, Median, Mode and Standard deviation) using
Python.

To calculate basic statistical measures such as Mean, Median, Mode, and Standard Deviation on a
given dataset using Python or R.
Procedure:
import pandas as pd
import numpy as np
from statistics import mode # Mode calculation
data = pd.read_csv(r"C:\Users\lenovo\Downloads\TaillePoids.csv")
print(data.head()) # Display first few rows

mean_values = data.mean()
print("Mean:\n", mean_values)

output
median_values = data.median()
print("Median:\n", median_values)
output

mode_values = data.mode().iloc[0] # Mode returns multiple values; take the first row
print("Mode:\n", mode_values)

output

std_values = data.std()
print("Standard Deviation:\n", std_values)

output
Experiment 6

Tableau – getting started

 User interface

 Methodology for working with the interface

 Connecting to different types of data sources (Excel, csv, Access, MySQL, Tableau Server)

 Editing Data Connections and Data Sources; Live mode vs. Extract mode

 Date interpreter / Pivot

Objective: To get introduced to Tableau software by understanding the user interface, connecting to
different types of data sources, and exploring data connections using Live and Extract modes.

Procedure:

Tableau – Getting Started Guide

1. User Interface Overview

Tableau's interface is designed for intuitive data exploration. Key components include:

Data Pane: Lists data sources, fields (dimensions and measures), and calculations.

Shelves (Rows, Columns, Marks): Drag fields here to build visualizations. The Marks Card controls color,
size, labels, etc.

Toolbar: Tools for sorting, grouping, filtering, and creating calculations.


Sheets/Dashboards/Stories Tabs: Switch between worksheets, dashboards, and story presentations.

Show Me: Recommends chart types based on selected fields.

Worksheet Workspace: Canvas for building visualizations.

2. Methodology for Working with the Interface

Workflow:

Connect to Data: Start by linking to your data source.

Build Data Source: Clean/organize data (e.g., pivot, split fields).

Create Visualizations: Drag dimensions/measures to shelves and use Show Me for chart suggestions.

Refine: Apply filters, sort, or group data as needed.

Build Dashboards/Stories: Combine multiple sheets into interactive dashboards.

Best Practices:

Use folders in the Data Pane to organize fields.

Right-click fields to change data types (e.g., string to date).

Use aliases to rename categorical values for clarity.

3. Connecting to Data Sources

Tableau supports numerous data sources. Here’s how to connect:


Excel/CSV:

Click “Microsoft Excel” or “Text File” > Browse to the file.

Drag sheets/tables to the canvas or use the “New Union” option.

Access:

Requires ODBC driver (pre-installed on Windows). Select “Microsoft Access” > Browse to .accdb.

MySQL:

Choose “MySQL” > Enter server name, port, database, and credentials.

Tableau Server:

Select “Tableau Server” > Enter URL > Log in to access published data sources.

4. Editing Data Connections & Data Sources

Live vs. Extract Mode:

Live: Direct connection to the source (real-time data). Use for small datasets or frequent updates.

Extract (Hyper): Snapshot of data stored in Tableau’s .hyper format. Improves performance for large
datasets; refresh manually or on a schedule.

Editing Connections:

Right-click fields to rename, hide, or create calculated fields.

Use Data Source Page to join tables, pivot columns, or split fields (e.g., splitting "Year-Month" into two
columns).

Adjust joins/relationships in the physical/logical layer for complex data models.

5. Date Interpreter & Pivot

Date Interpreter:

Automatically detects date formats in ambiguous fields (e.g., "2023-Oct-15" vs. "10/15/23").

Right-click a field > Date Properties to adjust interpretation.

Pivot:
Reshape data from wide to long format (e.g., converting columns like "Jan Sales," "Feb Sales" into "Month"
and "Sales" rows).

Select columns > Right-click > Pivot. Rename pivoted fields as needed.

Experiment 7
Joining multiple datasets
 Union / Join
 Cross database joins
 Data Blending – integrating different data source

Objective: To understand and apply different methods of combining multiple datasets in Tableau using
Union, Join, Cross-database Join, and Data Blending.

Procedure:

1. Union vs. Join

Union

Purpose: Combine datasets vertically (stack rows) when they have the same structure (e.g., monthly sales
data in separate sheets).

Use Case: Appending data from similar sources (e.g., Sales_Jan, Sales_Feb).

Steps in Tableau:

Drag the first table to the canvas.

Click the New Union button (double-table icon).

Add additional tables/sheets to the union.


Note: Columns must align by name/position. Use wildcard unions to automate merging similar files (e.g.,
all .csv files in a folder).

Join

Purpose: Combine datasets horizontally (merge columns) using a common key (e.g., Order_ID).

Types:

Inner Join: Returns matching rows from both tables.

Left/Right Join: Returns all rows from one table and matches from the other.

Full Outer Join: Returns all rows from both tables.

Steps in Tableau:

Drag the primary table to the canvas.

Drag a secondary table and drop it on the primary table.

Define the join clause (e.g., Orders.Order_ID = Returns.Order_ID).

Best Practice: Use joins for structured relational data (e.g., SQL tables).

2. Cross-Database Joins

Purpose: Join tables from different databases (e.g., Excel + SQL Server).

Use Case: Combining CRM data (Salesforce) with transactional data (MySQL).

Steps:

Connect to the first data source (e.g., Excel).

Add a second connection (e.g., MySQL) via the New Data Source button.

Drag both tables to the canvas and define the join relationship.

Limitations:

Requires compatible data types.

Performance may degrade with large datasets (use Extracts to optimize).

3. Data Blending

Purpose: Integrate data from different sources (e.g., Excel + Google Sheets) without physically joining
them.
How It Works:

Blending aggregates data at the visualization level using a common dimension (e.g., Region).

Left Data Source: Primary data (defines the view).

Secondary Data Source: Linked via a shared field (displayed as an orange link icon).

Steps:

Connect to the primary data source (e.g., Sales data in Excel).

Connect to the secondary source (e.g., Marketing budget in Google Sheets).

Drag a field from the secondary source to the view (Tableau will auto-blend using common dimensions).

Use Case: Combining sales data (SQL) with budget data (Excel) by Region.

Limitations:

Blending works at the aggregate level (not row-level joins).

Secondary data sources are filtered based on the primary source.

4. Key Differences

Method Use Case Data Structure Performance

Append rows with identical


Union Vertical stacking Fast
columns

Merge columns via a shared


Join Horizontal merge Depends on data size
key

Cross-Database Combine data from different Slower (optimize with


Horizontal merge
Join DBs/Sources extracts)

Aggregate data from Linked by a common Slower for large


Data Blending
unrelated sources dimension datasets
Experiment 8
Basic functionalities
 Filtering
 Sorting
 Grouping
 Hierarchies
 Creating sets
 Types of dates – Continuous vs. Discreet
 Pivot tables

Objective: To explore and apply basic functionalities in Tableau such as filtering, sorting, grouping,
hierarchies, creating sets, working with date types, and pivot tables for effective data analysis.

Procedure:

i) Handling Missing Data in Tableau

Tableau does not have built-in imputation tools like Python, but you can handle missing values using the
following methods:

1. Filter Out Missing Values:

 Steps:
1. Drag the field with missing values (e.g., Age) to the Filters shelf.
2. In the filter dialog, uncheck Null or Null (missing).

3. Click OK to exclude rows with missing values.

2. Replace Nulls with a Default Value:

 Create a Calculated Field:


IF ISNULL([Age]) THEN AVG([Age]) ELSE [Age] END

o Replace AVG([Age]) with MEDIAN([Age]) or a fixed value (e.g., 0).

3. Use Tableau Prep (Advanced):

 Use Tableau Prep Builder to clean data before importing it into Tableau Desktop:
1. Add a Clean Step to replace nulls in numerical columns with Mean or Median.

2. For categorical columns, replace nulls with a placeholder (e.g., Unknown).

ii) Min-Max Normalization in Tableau

Tableau does not have a built-in normalization function, but you can create normalized values using
calculations:

1. Manual Calculation:

 Formula:
([Value] - {FIXED : MIN([Value])})
/
({FIXED : MAX([Value])} - {FIXED : MIN([Value])})
Example (for Income):
([Income] - {FIXED : MIN([Income])})
/
({FIXED : MAX([Income])} - {FIXED : MIN([Income])})

 This scales values between [0, 1].

2. Steps:

1. Create a Calculated Field named Income (Normalized) using the formula above.
2. Drag the normalized field to your visualization (e.g., axes, tooltips).

3. Alternative:

 Use Tableau Prep to normalize data before analysis:


1. Add a Clean Step.

2. Use the formula above or a Range scaling option in Tableau Prep.

Example Output in Tableau

Before Preprocessing:

Custome Incom
Age Gender
r e

A 25 50,000 Male

B — 60,000 Female

C 30 — —
After Handling Missing Data:

Custome Income
Age (Imputed) Gender (Imputed)
r (Imputed)

A 25 50,000 Male

B 27.5 (Mean) 60,000 Female

C 30 55,000 (Mean) Unknown

After Min-Max Normalization:

Age
Customer Income (Normalized)
(Normalized)

A 0.0 0.0

B 0.2 0.33

C 0.5 0.5

Experiment 9
Dashboards and stories

 Building dashboards

 Dashboard objects

 Dashboard formatting

 Dashboard extensions Story points

Objective: To build interactive dashboards and stories using Tableau, incorporating visual elements,
dashboard objects, formatting, extensions, and story points for better data storytelling.

Procedure:
1. Building Dashboards

A dashboard is a collection of worksheets, filters, legends, and other objects arranged in a single view to
provide a holistic data perspective.

Steps to Build a Dashboard:

Create Worksheets: Build individual charts/sheets (e.g., bar charts, maps, tables) for your analysis.

Open the Dashboard Tab: Click the New Dashboard icon at the bottom of the workbook.

Add Sheets:

Drag and drop worksheets from the Dashboard Pane onto the canvas.

Arrange them using layout containers (horizontal/vertical) for alignment.

Add Objects:

Use text boxes, images, web pages, or blank spaces for annotations or branding.

Add Filters/Parameters:

Drag filters from the worksheet or create parameters for interactivity.

2. Dashboard Objects

Objects are building blocks of a dashboard. Key objects include:

Worksheets: Charts or tables from your analysis.

Filters: Interactive controls to slice data (e.g., date ranges, categories).

Parameters: Dynamic inputs for calculations (e.g., switching metrics).

Text Boxes: Add titles, descriptions, or annotations.

Images/Logos: Insert company logos or visual aids.

Web Pages: Embed live web content (e.g., a live Twitter feed).

Extensions: Add third-party tools (e.g., statistical models, maps).

3. Dashboard Formatting
Formatting ensures your dashboard is visually appealing and user-friendly.

Key Formatting Tools:

Layout Pane: Adjust padding, borders, and spacing between objects.

Themes: Apply predefined color/font themes (Format > Workbook Theme).

Fonts/Colors: Customize text, headers, and background colors.

Tooltip Formatting: Edit tooltips to show/hide details on hover.

Sheet Order: Control which sheets appear on top (Dashboard > Arrange).

4. Dashboard Extensions

Extensions enhance dashboards with external tools or custom code.


Examples:

Analytics Pane: Add trend lines, forecasts, or clustering.

Third-Party Extensions:

Maps+: Enhanced mapping tools.

Stats Tools: Statistical analysis (e.g., regression).

Google Sheets Sync: Live data integration.

5. Story Points

A Story is a sequence of dashboards or sheets that guide users through a data-driven narrative.

Steps to Build a Story:

Click the New Story tab at the bottom.

Add Sheets/Dashboards: Drag a worksheet or dashboard to the story pane.

Add Captions: Use text boxes to explain each story point (e.g., insights, context).

Format the Story:

Adjust layout and fonts to match dashboards.

Add navigation buttons (Back/Next) for guided flow.

Publish: Share stories via Tableau Server/Public or export as PDF.


Example OUTPUT: Sales Performance Story

Story Point Dashboard/Sheet Caption

1. Overview Sales by Region "Q3 Sales grew 15% YoY, led by the West Region."

2. Drill- Product Category "Electronics drove 60% of revenue, but apparel sales
Down Analysis declined."

3. Forecast Trend Projection "Holiday sales are projected to exceed targets by 20%."

Experiment 10
Calculations

 Syntax

 Table calculations

 LOD expressions

 Aggregate Date, Logic, String, Number, Type calculations

Objective: To perform various types of calculations in Tableau including basic syntax, table
calculations, Level of Detail (LOD) expressions, and aggregate operations for different data types.

Procedure:

1. Calculation Syntax
Tableau uses a formula language similar to Excel or SQL. Key components include:

Basic Syntax Rules:

Fields: Use square brackets for field names: [Sales], [Region].

// Calculate profit margin

[Profit] / [Sales]

// Conditional logic for performance categories

IF [Sales] > 10000 THEN "High" ELSE "Low" END

// String concatenation

[Region] + " - " + STR([Sales])

2. Table Calculations

Table calculations are performed on the aggregated results in your view (e.g., running totals, percent of
total).

Common Table Calculation Functions:

Rank: RANK()

Running Total: RUNNING_SUM()

Percent of Total: PERCENT_OF_TOTAL()

Window Average: WINDOW_AVG()

Difference: DIFF()

// Running total of sales across months

RUNNING_SUM(SUM([Sales]))
// Percent of total sales per region

SUM([Sales]) / TOTAL(SUM([Sales]))

Example Output:

Month Sales Running Total

Jan 1000 1000

Feb 1500 2500

Mar 2000 4500

3. LOD (Level of Detail) Expressions

LOD calculations let you define the granularity (detail level) of aggregations independently of the
visualization.

Types of LODs:

FIXED:

Compute at a specified dimension level, ignoring the view’s granularity.

{ FIXED [Region] : SUM([Sales]) } // Total sales per region

INCLUDE:

Compute at a finer granularity than the view.

{ INCLUDE [Customer] : AVG([Sales]) } // Avg sales per customer

EXCLUDE:

Compute at a coarser granularity than the view.

{ EXCLUDE [Product] : SUM([Sales]) } // Total sales ignoring product

1. Date Calculations

Perform operations on date fields, such as extracting parts of dates, aggregating, or calculating intervals.

Common Functions:
DATEPART(): Extract parts of a date (e.g., year, month, day).

DATEPART('month', [Order Date]) // Returns 1 for January, 2 for February, etc.

DATEDIFF(): Calculate the difference between two dates.

DATEDIFF('day', [Order Date], [Ship Date]) // Days between order and shipment.

DATEADD(): Add/subtract time intervals.

DATEADD('month', 3, [Order Date]) // Adds 3 months to the order date.

DATETRUNC(): Truncate dates to a specific granularity.

DATETRUNC('quarter', [Order Date]) // Rounds dates to the start of the quarter.

Aggregation Example:

// Total sales by month:

DATETRUNC('month', [Order Date]) // Group by month

SUM([Sales])

2. Logic (Conditional) Calculations

Use IF, CASE, or Boolean logic to create conditional outputs.

Examples:

Simple IF-THEN-ELSE:

IF [Sales] > 10000 THEN "High" ELSE "Low" END

CASE Statement (for multiple conditions):

CASE [Region]

WHEN "West" THEN 1

WHEN "East" THEN 2

ELSE 3

END

3. String Calculations

Manipulate text fields using string functions.

Common Functions:

Concatenation:
[First Name] + " " + [Last Name] // Combines names into "John Doe"

Substrings:

LEFT([Product Code], 3) // First 3 characters of the code.

MID([Address], 5, 10) // Extract characters starting at position 5.

4. Number Calculations

Perform arithmetic operations or aggregations on numeric fields.

Examples:

Basic Arithmetic:

[Profit] / [Sales] // Profit margin.

Aggregation:

SUM([Sales]) - SUM([Cost]) // Total profit.

Rounding:

ROUND([Profit], 2) // Rounds to 2 decimal places.

Percentage of Total:

SUM([Sales]) / TOTAL(SUM([Sales])) // Percent contribution to total sales.

5. Type Conversion Calculations

Convert data types (e.g., string to number, date to string).

Common Functions:

Convert to String:

STR([Sales]) // Converts 1000 to "1000".

Convert to Number:

FLOAT([Price String]) // Converts "$10.5" to 10.5 (requires cleanup first).

INT([Quantity]) // Converts "5" to 5.

Convert to Date:

DATE([Order Date String]) // Converts "2023-01-01" to a date type.

Example Workflow
Goal: Calculate the average sales per customer, excluding canceled orders.

// Step 1: Filter non-canceled orders

IF [Order Status] = "Canceled" THEN 0 ELSE [Sales] END

// Step 2: Aggregate sales by customer

{ FIXED [Customer ID] : SUM([Sales]) }

// Step 3: Convert to string for labeling

"Customer: " + [Customer ID] + " | Avg Sales: " + STR(AVG([Sales]))

Experiment 11

Built-in chart types/visualisations:


 Line chart
 Dot chart
 Bar chart
 Other types of visualisation (bullet graph, Heat map, Tree map, etc.).
 Combo charts – dual axis

Objective: To explore various built-in chart types and visualizations in Tableau such as line, dot, bar,
bullet graphs, heat maps, tree maps, and dual-axis combo charts.

Procedure:

1. Line Chart
Purpose: Show trends over time or continuous categories.
How to Build:

Drag a date or continuous dimension to Columns.

Drag a measure (e.g., Sales) to Rows.

Change the mark type to Line.

2. Dot Chart (Scatter Plot)

Purpose: Compare relationships between two measures.


How to Build:

Drag a measure to Columns (e.g., Sales).

Drag another measure to Rows (e.g., Profit).

Add a dimension to the Detail mark for color/size encoding.

Change the mark type to Circle.


3. Bar Chart

Purpose: Compare values across discrete categories.


Types:

Horizontal: Drag dimension to Rows, measure to Columns.

Stacked: Add a secondary dimension to the Color mark.


How to Build:

Drag a dimension (e.g., Category) to Columns.

Drag a measure (e.g., Sales) to Rows.

Change the mark type to Bar.

4. Bullet Graph

Purpose: Compare a measure to a target or benchmark.


How to Build:

Create a bar chart for the primary measure (e.g., Sales).

Add a reference line (e.g., Target Sales) via Analytics Pane.

Add color bands (e.g., Poor/Good/Great) using reference distributions.

5. Heat Map

Purpose: Visualize density or relationships between two dimensions using color.


How to Build:
Drag two dimensions to Rows and Columns (e.g., Region, Category).

Drag a measure to the Color mark (e.g., SUM(Sales)).

Change the mark type to Square.

6. Tree Map

Purpose: Show hierarchical data as nested rectangles (size/color-encoded).


How to Build:

Drag a hierarchy (e.g., Category > Sub-Category) to Label.

Drag a measure to Size (e.g., Sales).

Drag another measure to Color (e.g., Profit).

Change the mark type to Tree Map.


7. Combo Chart (Dual Axis)

Purpose: Layer two measures with different scales (e.g., Sales vs. Profit Margin).
How to Build:

Create a bar chart (e.g., SUM(Sales) by Month).

Drag a second measure (e.g., AVG(Profit Margin)) to the opposite axis (right-click > Dual Axis).

Synchronize axes if needed (right-click axis > Synchronize Axis).

8. Other Key Visualizations


Chart Type Purpose Steps

Show cumulative trends over


Area Chart Use a line chart > Change mark type to Area.
time.

Drag a measure to Rows > Analytics Pane >


Box Plot Display data distribution.
Add Box Plot.

Track timelines/project Use a date field and duration > Change mark type
Gantt Chart
schedules. to Gantt.

Combine a bar chart (descending order) + cumulative


Pareto Chart Highlight the 80/20 rule.
percentage line.

Waterfall Use a bar chart with Gantt bars and running total
Show incremental changes.
Chart calculations.
Experiment 12

Custom chart types:


 KPI matrix
 Waterfall
 Gantt
 Dot plot
 Pareto
 Analytics’ options: trend lines, forecasting, clustering

Objective: To create custom chart types in Tableau such as KPI Matrix, Waterfall, Gantt, Dot Plot,
Pareto chart, and apply advanced analytics like trend lines, forecasting, and clustering.

1. KPI Matrix

Purpose: Display key performance indicators (KPIs) in a grid format.


Example: Sales vs. Target by Region.

Steps:

Drag a dimension (e.g., Region) to Rows.

Drag measures (e.g., Sales, Profit, Target) to Columns.

Use Text Marks:

Drag measures to the Text shelf.

Format numbers (e.g., currency, percentages).

Add Conditional Formatting:

Right-click a measure > Apply Color Gradient (e.g., red for missed targets, green for achieved).

Tip: Use Worksheet Titles to label KPIs dynamically (e.g., SUM(Sales) vs. AVG(Target)).
2. Waterfall Chart

Purpose: Visualize cumulative changes (e.g., profit over time).

Steps:

Create a Bar Chart with a dimension (e.g., Month) and measure (e.g., Profit).

Add a Running Total Calculation:

Right-click the measure > Quick Table Calculation > Running Total.

Add Reference Lines:

Drag the measure to Detail and use GANTT Bars (Mark Type = Gantt).

Adjust bar colors (positive=green, negative=red).


Example:

3. Gantt Chart

Purpose: Track project timelines or durations.

Steps:

Drag Start Date to Columns.

Drag Duration (End Date - Start Date) to Rows.

Change Mark Type to Gantt.

Customize colors (e.g., by Project Phase).

Tip: Use Tooltips to show details like % Complete or Owner.


4. Dot Plot

Purpose: Compare categories using dots on an axis.

Steps:

Drag a dimension (e.g., Product) to Rows.

Drag a measure (e.g., Sales) to Columns.

Change Mark Type to Circle.

Add a Reference Line (e.g., Average Sales) for comparison.

Advanced: Use Jittering to avoid overlapping dots:

Create a calculated field: INDEX() % 10 and add it to Columns.

5. Pareto Chart

Purpose: Highlight the "80/20 rule" (e.g., top 20% products driving 80% of sales).

Steps:

Sort data descending by a measure (e.g., Sales).

Create a Bar Chart for the measure.

Add a Cumulative Percentage Line:

Right-click the axis > Add Reference Line > Distribution > Cumulative Sum.

Dual-Axis:

Drag the cumulative measure to the opposite axis.

Synchronize axes (right-click axis > Synchronize Axis).


Example:

6. Analytics Options

Trend Lines

How to Add:

Drag a trend line from the Analytics pane to the chart.

Choose a model (Linear, Logarithmic, Exponential).

Use Case: Identify sales trends over time.

Forecasting

How to Add:

Right-click a time-based chart > Forecast > Show Forecast.

Adjust settings (e.g., forecast length, seasonality).

Use Case: Predict future sales.

Clustering

How to Add:

Drag Clusters from the Analytics pane to the view.

Select variables (e.g., Sales, Profit) and number of clusters.

Use Case: Segment customers into groups based on behavior.

Summary
KPI Matrix: Use text tables with conditional formatting.

Waterfall: Running total + Gantt bars.

Gantt: Date fields + duration calculation.

Dot Plot: Circles + jittering.

Pareto: Sorted bars + cumulative line.

Analytics: Trend lines, forecasting, and clustering are built-in tools under the Analytics pane.
Experiment 13

CREATE AND FORMAT REPORTS USING THE TABLEAU DESKTOP


 Describe the use of Page Backgrounds and Templates
 Create visualizations to display the data
 Apply drill through and drill down
 Create and manage slicers with the use of filters.
 Explore visual interactions
 Review Bookmarks
 Publish the report to the Tableau online

Objective: To create and format reports using Tableau Desktop by exploring templates, applying drill-
through and drill-down features, using slicers and filters, managing interactions, bookmarks, and
publishing to Tableau Online.

Step 1: Connect to Data Source


Open Tableau Desktop and connect to your dataset (Excel, CSV, SQL, etc.).
Clean and prepare the data (e.g., rename fields, set data types).

Step 2: Create Visualizations


Example Visualizations (adjust based on your data):
Bar Chart: Sales by Region.
Line Chart: Monthly Sales Trends.
Map: Geographic distribution of customers.
Pie Chart: Product Category Distribution.
Scatter Plot: Profit vs. Quantity.
How to Create:
Drag dimensions (e.g., Region, Month) to Columns/Rows.
Drag measures (e.g., Sales, Profit) to Rows/Columns or Marks.
Use the Show Me panel to switch chart types.

Step 3: Apply Drill Down/Drill Through


Drill Down (Hierarchy):
Create a hierarchy (e.g., Country > State > City).
Right-click a data point (e.g., Country bar) and select Drill Down to State/City.
Drill Through:
Create a new sheet with detailed data.
Right-click a data point in the main visualization > Add Drill Through.
Link to the detailed sheet.

Step 4: Add Filters and Slicers


Filters:
Drag a field (e.g., Region, Product) to the Filters shelf.
Right-click the filter > Show Filter to display it on the dashboard.
Slicers:
Use a discrete field (e.g., Year) as a filter.
Format the slicer (e.g., dropdown, single-value list, or slider).

Step 5: Design Page Backgrounds


Background Image:
Go to Format > Workbook Theme to set a custom background.
Use Dashboard > Format to adjust background color/image for specific sheets.
Templates:
Save a dashboard layout as a template: File > Save As Template.
Reuse it for future reports with similar structures.

Step 6: Set Up Visual Interactions


Highlight Actions:
Go to Dashboard > Actions.
Add a Highlight Action to link visuals (e.g., selecting a region highlights sales in other charts).
Filter Actions:
Add a Filter Action to sync filters across multiple sheets (e.g., clicking a bar updates all visuals).

Step 7: Use Bookmarks


Create Bookmarks:
Adjust filters/views to a specific state (e.g., 2023 Sales in the Northeast).
Go to Window > Bookmarks > Create Bookmark.
Manage Bookmarks:
Use bookmarks to toggle between pre-saved views during presentations.

Step 8: Build the Dashboard


Create a New Dashboard:
Drag sheets from the Sheets pane onto the dashboard.
Arrange visuals using containers (horizontal/vertical) for alignment.
Add Filters/Slicers:
Drag filters from the right pane to the dashboard.
Group them in a floating container for a clean look.
Add Titles/Text:
Use Text objects to label sections or add descriptions.

Step 9: Publish to Tableau Online


Sign In:
Go to Server > Sign In and log in to your Tableau Online account.
Publish:
Click Server > Publish Workbook.
Choose a project/folder and set permissions (e.g., viewer/editor access).
Share:
Copy the link or embed the dashboard in a website.

You might also like