DATA SCIENCE AND VISUALIZATION 12202080501060
202046707
Practical 2
To study and implement different types of graphs using popular data visualization
libraries
Introduction:
Data visualization is an essential aspect of data analysis that allows users to better
understand patterns, trends, and outliers in a dataset.
By transforming raw data into graphical representations, it becomes easier to
communicate insights effectively. This practical focuses on
implementing various types of visualizations using Python libraries such as
Matplotlib and Seaborn.
Theory
Popular graphs used in data science include:
- Bar Plot: Used to compare categorical data.
- Histogram: Represents the distribution of numerical data.
- Scatter Plot: Shows the relationship between two continuous variables.
- Area Plot: Displays quantitative data graphically; similar to a line plot but filled.
- Pie Chart: Shows proportions and percentages between categories.
Libraries used:
- Matplotlib: The foundational Python plotting library.
- Seaborn: A statistical plotting library built on top of Matplotlib that provides high-
level interface for drawing attractive and informative graphics.
Python Code
import matplotlib.pyplot as plt
import pandas as pd
df = pd.read_csv('/content/drive/MyDrive/DSV
GCET 5
DATA SCIENCE AND VISUALIZATION 12202080501060
202046707
/Dataset_(12202080501060)/archive (3).zip')
df.info()
# 1. Bar Plot – Payment Method Counts
payment_counts = df["payment_method"].value_counts()
plt.figure(figsize=(6, 4))
payment_counts.plot(kind="bar", color="skyblue")
plt.legend()
plt.title("Distribution of Payment Methods")
plt.xlabel("Payment Method")
plt.ylabel("Count")
plt.show()
# 2. Histogram – Purchase Values
plt.figure(figsize=(6, 4))
plt.hist(df["value [USD]"], bins=30, color="lightgreen", edgecolor="black")
GCET 6
DATA SCIENCE AND VISUALIZATION 12202080501060
202046707
plt.title("Histogram of Purchase Values")
plt.xlabel("Value (USD)")
plt.ylabel("Frequency")
plt.show()
# 3. Scatter Plot – Clicks vs Time on Site
plt.figure(figsize=(6, 4))
plt.scatter(df["clicks_in_site"], df["time_on_site [Minutes]"], color="tomato",
alpha=0.6)
plt.title("Scatter Plot: Clicks vs Time on Site")
plt.xlabel("Clicks in Site")
plt.ylabel("Time on Site (Minutes)")
plt.show()
GCET 7
DATA SCIENCE AND VISUALIZATION 12202080501060
202046707
from datetime import date
# Area Plot – Total Purchase Value over Time
plt.figure(figsize=(10, 4))
daily_value = df.groupby('date')['value [USD]'].sum()
daily_value.index = pd.to_datetime(daily_value.index, format='%d/%m/%Y')
daily_value = daily_value.sort_index()
plt.fill_between(daily_value.index, daily_value.values, color="skyblue", alpha=0.6)
plt.title("Area Plot: Total Purchase Value Over Time")
plt.xlabel("Date")
plt.ylabel("Total Value (USD)")
plt.show()
# Pie Chart – Distribution of Payment Methods
GCET 8
DATA SCIENCE AND VISUALIZATION 12202080501060
202046707
payment_counts = df["payment_method"].value_counts()
plt.figure(figsize=(6, 6))
plt.pie(payment_counts, labels=payment_counts.index, autopct="%1.1f%%",
explode=(0,0.1), startangle=140, colors=["gold", "lightcoral", "skyblue"])
plt.title("Pie Chart: Payment Method Distribution")
plt.axis("equal") # Equal aspect ratio ensures the pie chart is circular
plt.legend()
plt.show()
Conclusion:
In this practical, we explored how to visually analyze a dataset using various types
of graphs including bar plots, histograms, and scatter plots.
These visualizations make it easier to interpret complex datasets, identify trends,
and support data-driven decision-making.
Libraries such as Matplotlib and Seaborn significantly simplify the process of
creating insightful and high-quality plots.
GCET 9