Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
25 views17 pages

Visualisation Basic

Uploaded by

adi9910119238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
25 views17 pages

Visualisation Basic

Uploaded by

adi9910119238
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 17

Visualization

with Python
Suchitra dutta
• pip install matplotlib seaborn pandas

Step 1: Install Necessary


Libraries
• import pandas as pd
• import matplotlib.pyplot as plt
• import seaborn as sns

Step 2: Import Libraries


• You can load data from various sources like CSV, Excel,
databases, etc.
• # Load from a CSV file
• df =
pd.read_csv('C:\\Users\\dell\\OneDrive\\Desktop\\CE.csv'
)
• # Show first 5 rows
• print(df.head())

Step 3: Load Your Dataset


• Before visualization, understand your data.
• print(df.info()) # Data types and missing values
• print(df.describe()) # Summary statistics
• print(df.columns) # Column names

Step 4: Explore the Data


• # Drop missing values
• df = df.dropna()

• # Convert categorical to numerical (if needed)


• df['gender'] = df['gender'].map({'Male': 0, 'Female': 1})

Clean or Prepare the Data


(if needed)
• df = df.drop_duplicates()

• # Convert Data Types


• # Convert column to integer
• df['age'] = df['age'].astype(int)

• # Convert date column to datetime


• df['date'] = pd.to_datetime(df['date'])

Remove Duplicate Rows


• # Fill with mean (for numeric columns)
• df['age'] = df['age'].fillna(df['age'].mean())

• # Fill with median


• df['salary'] = df['salary'].fillna(df['salary'].median())

• # Fill with mode (for categorical columns)


• df['gender'] = df['gender'].fillna(df['gender'].mode()[0])

Fill Missing Values


• df.columns = df.columns.str.lower()

#Replace Specific Values


• df['gender'] = df['gender'].replace({'M': 'Male', 'F':
'Female'})

Lowercase Column
Names
Step 6: Choose the Right
Visualization Type
Goal Chart Type Library
Distribution of a variable Histogram, KDE seaborn

Compare categories Bar chart matplotlib/seaborn

Relationship between 2
Scatter plot seaborn
vars

Time series analysis Line chart matplotlib

Correlation matrix Heatmap seaborn

Proportion of a whole Pie chart matplotlib


• import seaborn as sns
• import matplotlib.pyplot as plt
• # Plot histogram of the 'age' column
• sns.histplot(df['age'], bins=5, kde=True, color='skyblue')
• # Add titles and labels
• plt.title("Age Distribution of Music Preferences")
• plt.xlabel("Age")
• plt.ylabel("Number of People")
• plt.grid(True)
• # Show the plot
• plt.show()

#bins=5 means the age range will be split into 5 equally spaced groups.
#KDE stands for Kernel Density Estimate.
• It draws a smooth curve over the histogram to show the probability density — essentially
the shape of the distribution.

Step 7: Create
Visualizations
• df['gender'].value_counts().plot(kind='bar')
• plt.title("Gender Distribution")
• plt.xlabel("Gender")
• plt.ylabel("Count")
• plt.show()

2. Bar Plot
• sns.scatterplot(x='age', y='gender', data=df)
• plt.title("Age vs Gender")
• plt.show()

3. Scatter Plot
• plt.plot(df['gender'], df['age'])
• plt.title("Gender vs Age")
• plt.xlabel(" Gender ")
• plt.ylabel(" Age ")
• plt.xticks(rotation=45)
• plt.show()
• 🔹 plt.xticks(rotation=45)
• This line rotates the x-axis labels by 45 degrees.

4. Line Plot
• import seaborn as sns
• import matplotlib.pyplot as plt
• # Select only numeric columns
• numeric_df = df.select_dtypes(include=['number'])
• # Compute correlation
• correlation = numeric_df.corr()
• # Create the heatmapsns.heatmap(correlation, annot=True,
cmap='Blues')
• plt.title("Correlation Heatmap (Numeric Features)")
• plt.show()

Correlation Heatmap
(Numeric Features)
annot Show numbers in each cell annot=True
cmap Set the color theme/style cmap='Blues'
• plt.savefig("my_plot.png", dpi=300)

Step 9: Save the Plots


• After plotting, look for:
• Trends or patterns
• Outliers or anomalies
• Correlations or dependencies

Step 10: Interpret and


Share Insights

You might also like