Thanks to visit codestin.com
Credit goes to github.com

Skip to content

This project cleans 🧹 and visualizes πŸ“Š the, "2024 layoffs dataset using", 🐍Python and πŸ“ŠPower BI πŸ“ˆ. It handles missing values ❓, removes duplicates πŸ”, and creates charts πŸ“Š and a map 🌍.

License

Notifications You must be signed in to change notification settings

Willie-Conway/Data-Cleaning-and-Preparation

Folders and files

NameName
Last commit message
Last commit date

Latest commit

Β 

History

14 Commits
Β 
Β 
Β 
Β 
Β 
Β 

Repository files navigation

πŸš€ Layoffs Dataset 2024: Data Cleaning, Preparation, and Visualization πŸ“Š

This project involves cleaning and preparing the 2024 layoffs dataset for analysis using Python, SQL, and Power BI. After cleaning the raw data, various visualizations were created in Power BI to analyze trends in layoffs by year, industry, location, and more.

πŸ“–Table of Contents

Project OverviewπŸ“

This project focuses on preparing and visualizing the 2024 layoffs dataset. After cleaning the raw data by addressing issues like missing values, duplicates, and inconsistent formats, I used Power BI to create various visualizations that uncover insights from the data. This includes a line chart to show layoffs by year and industry, a map chart for layoffs by country, and a summary card to visualize the total number of layoffs.

🎯 Objective

The goal of this project is to:

  • Clean and prepare the 2024 layoffs dataset for analysis.
  • Create meaningful visualizations to help stakeholders understand trends and patterns in mass layoffs.

πŸ›  Technologies Used

  • 🐍Python (Pandas, NumPy)
  • πŸ›’οΈSQL (SQLite)
  • πŸ“ŠPower BI (for visualizations)
  • πŸ“…Excel (for initial inspection and analysis)

πŸ“Š Dataset Description

The dataset contains information about layoffs from 2020 to 2024, scraped from Layoffs.fyi. The purpose of the dataset is to enable the analysis of recent mass layoffs and discover patterns in the layoffs across industries and countries.

🧹 Data Cleaning Process

The data cleaning process involved several key steps:

  1. Removing Duplicates: Duplicate rows were removed to ensure data integrity.
  2. Imputing Missing Values:
    • Numeric columns: Missing values were replaced with the rounded mean of the respective columns.
    • Categorical columns: Missing values were filled with the mode (most frequent value) of each column.
  3. Normalizing Data Formats:
    • Text columns were converted to lowercase for consistency.
    • Date columns were standardized to the 'YYYY-MM-DD' format.
  4. Data Type Standardization: Ensured that all columns had the correct data types for analysis.

πŸ“Š Power BI Visualizations

After cleaning the data, I used Power BI to create a set of insightful visualizations:

  1. πŸ“ˆ Line Chart: Company Layoffs by Year and Industry

    • This line chart visualizes the number of layoffs across the years (2020-2024) and the industries most affected by layoffs.
    • It helps to identify trends in layoffs by industry and across different years. Line Chart
  2. 🌍 Map Chart: Layoffs by Country

    • A map chart displays the total layoffs by country, enabling a geographic view of mass layoffs.
    • This helps identify regions with the highest concentration of layoffs and visualize global trends. Map Chart
  3. πŸ“Š Summary Card: Total Layoffs (1.03M)

    • A card visual was used to display the total number of layoffs across all years and industries, which came to 1.03 million.
    • This provides a quick summary of the data for stakeholders.
  4. πŸ“Š Dashboard: Considated View

    • A consolidated view of all three visualizations in the Power BI Dashboard. This dashboard combines the line chart πŸ“ˆ, map chart 🌍, and summary card πŸ—‚ into a comprehensive, interactive interface for stakeholders to explore layoffs trends, geographic distribution, and overall totals.

Interactive Chart

πŸ“Ή Demo Video

πŸ’» Code Explanation

Below is a summary of the key Python code used for cleaning the data:

import pandas as pd
import numpy as np
import sqlite3

# Load the dataset
data = pd.read_csv(r'csv/layoffs_2024.csv')

# Initial data inspection
print("Initial data structure:")
print(data.info())

# Remove duplicate rows
data = data.drop_duplicates()

# Impute missing values
numeric_columns = data.select_dtypes(include=['float64', 'int64']).columns
for column in numeric_columns:
    mean_value = round(data[column].mean(), 2)
    data[column] = data[column].fillna(mean_value)

categorical_columns = data.select_dtypes(include=['object']).columns
for column in categorical_columns:
    mode_value = data[column].mode()[0]
    data[column] = data[column].fillna(mode_value)

# Normalize text data
for column in categorical_columns:
    data[column] = data[column].str.lower()

# Standardize date formats
if 'date' in data.columns:
    data['date'] = pd.to_datetime(data['date']).dt.strftime('%Y-%m-%d')

# Ensure correct data types
data['total_laid_off'] = pd.to_numeric(data['total_laid_off'], errors='coerce')
data['percentage_laid_off'] = pd.to_numeric(data['percentage_laid_off'], errors='coerce')
data['funds_raised'] = pd.to_numeric(data['funds_raised'], errors='coerce')

# Save cleaned data to CSV and SQL
cleaned_file_path = 'cleaned_layoffs_2024.csv'
data.to_csv(cleaned_file_path, index=False)

conn = sqlite3.connect('cleaned_layoffs.db')
data.to_sql('cleaned_layoffs', conn, if_exists='replace', index=False)
conn.close()

πŸ“ Key Steps

  1. Load Dataset: The dataset is loaded from a CSV file.
  2. Remove Duplicates: Duplicate rows are dropped.
  3. Impute Missing Values: Missing values in numeric and categorical columns are imputed using the mean and mode, respectively.
  4. Normalize Data: Text data is converted to lowercase, and date columns are standardized.
  5. Save the Cleaned Data: The cleaned data is saved in both CSV and SQLite formats.

πŸ“ˆ Results

  • Data Cleaned: The dataset was cleaned of duplicate entries, and missing values were imputed.
  • Visualizations Created: After cleaning the data, the following Power BI visualizations were created:
    • Line Chart: Showed layoffs by year and industry from 2020 to 2024.
    • Map Chart: Visualized the geographic distribution of layoffs by country.
    • Summary Card: Displayed the total number of layoffs (1.03M).

πŸƒβ€β™€οΈ How to Run the Project

  1. Clone or download the repository.
  2. Make sure you have the required Python libraries installed:
    pip install pandas numpy sqlite3
  3. Place the layoffs_2024.csv file in the csv/ directory.
  4. Run the Python script to clean the dataset:
    python Data_Cleaning_and_Preparation.py
  5. Import the cleaned dataset (cleaned_layoffs_2024.csv) into Power BI for visualization.

πŸ‘ Credits

  • Dataset: Layoffs.fyi
  • Author: Roger Lee
  • Project Developer: Willie Conway ✨

πŸ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

About

This project cleans 🧹 and visualizes πŸ“Š the, "2024 layoffs dataset using", 🐍Python and πŸ“ŠPower BI πŸ“ˆ. It handles missing values ❓, removes duplicates πŸ”, and creates charts πŸ“Š and a map 🌍.

Topics

Resources

License

Stars

Watchers

Forks

Languages