Visualization
with Python
Suchitra dutta
• pip install matplotlib seaborn pandas
Step 1: Install Necessary
Libraries
• import pandas as pd
• import matplotlib.pyplot as plt
• import seaborn as sns
Step 2: Import Libraries
• You can load data from various sources like CSV, Excel,
databases, etc.
• # Load from a CSV file
• df =
pd.read_csv('C:\\Users\\dell\\OneDrive\\Desktop\\CE.csv'
)
• # Show first 5 rows
• print(df.head())
Step 3: Load Your Dataset
• Before visualization, understand your data.
• print(df.info()) # Data types and missing values
• print(df.describe()) # Summary statistics
• print(df.columns) # Column names
Step 4: Explore the Data
• # Drop missing values
• df = df.dropna()
• # Convert categorical to numerical (if needed)
• df['gender'] = df['gender'].map({'Male': 0, 'Female': 1})
Clean or Prepare the Data
(if needed)
• df = df.drop_duplicates()
• # Convert Data Types
• # Convert column to integer
• df['age'] = df['age'].astype(int)
• # Convert date column to datetime
• df['date'] = pd.to_datetime(df['date'])
Remove Duplicate Rows
• # Fill with mean (for numeric columns)
• df['age'] = df['age'].fillna(df['age'].mean())
• # Fill with median
• df['salary'] = df['salary'].fillna(df['salary'].median())
• # Fill with mode (for categorical columns)
• df['gender'] = df['gender'].fillna(df['gender'].mode()[0])
Fill Missing Values
• df.columns = df.columns.str.lower()
#Replace Specific Values
• df['gender'] = df['gender'].replace({'M': 'Male', 'F':
'Female'})
Lowercase Column
Names
Step 6: Choose the Right
Visualization Type
Goal Chart Type Library
Distribution of a variable Histogram, KDE seaborn
Compare categories Bar chart matplotlib/seaborn
Relationship between 2
Scatter plot seaborn
vars
Time series analysis Line chart matplotlib
Correlation matrix Heatmap seaborn
Proportion of a whole Pie chart matplotlib
• import seaborn as sns
• import matplotlib.pyplot as plt
• # Plot histogram of the 'age' column
• sns.histplot(df['age'], bins=5, kde=True, color='skyblue')
• # Add titles and labels
• plt.title("Age Distribution of Music Preferences")
• plt.xlabel("Age")
• plt.ylabel("Number of People")
• plt.grid(True)
• # Show the plot
• plt.show()
#bins=5 means the age range will be split into 5 equally spaced groups.
#KDE stands for Kernel Density Estimate.
• It draws a smooth curve over the histogram to show the probability density — essentially
the shape of the distribution.
Step 7: Create
Visualizations
• df['gender'].value_counts().plot(kind='bar')
• plt.title("Gender Distribution")
• plt.xlabel("Gender")
• plt.ylabel("Count")
• plt.show()
2. Bar Plot
• sns.scatterplot(x='age', y='gender', data=df)
• plt.title("Age vs Gender")
• plt.show()
3. Scatter Plot
• plt.plot(df['gender'], df['age'])
• plt.title("Gender vs Age")
• plt.xlabel(" Gender ")
• plt.ylabel(" Age ")
• plt.xticks(rotation=45)
• plt.show()
• 🔹 plt.xticks(rotation=45)
• This line rotates the x-axis labels by 45 degrees.
4. Line Plot
• import seaborn as sns
• import matplotlib.pyplot as plt
• # Select only numeric columns
• numeric_df = df.select_dtypes(include=['number'])
• # Compute correlation
• correlation = numeric_df.corr()
• # Create the heatmapsns.heatmap(correlation, annot=True,
cmap='Blues')
• plt.title("Correlation Heatmap (Numeric Features)")
• plt.show()
Correlation Heatmap
(Numeric Features)
annot Show numbers in each cell annot=True
cmap Set the color theme/style cmap='Blues'
• plt.savefig("my_plot.png", dpi=300)
Step 9: Save the Plots
• After plotting, look for:
• Trends or patterns
• Outliers or anomalies
• Correlations or dependencies
Step 10: Interpret and
Share Insights