🚀 Layoffs Dataset 2024: Data Cleaning, Preparation, and Visualization 📊

This project involves cleaning and preparing the 2024 layoffs dataset for analysis using Python, SQL, and Power BI. After cleaning the raw data, various visualizations were created in Power BI to analyze trends in layoffs by year, industry, location, and more.

📖Table of Contents

Project Overview📁

This project focuses on preparing and visualizing the 2024 layoffs dataset. After cleaning the raw data by addressing issues like missing values, duplicates, and inconsistent formats, I used Power BI to create various visualizations that uncover insights from the data. This includes a line chart to show layoffs by year and industry, a map chart for layoffs by country, and a summary card to visualize the total number of layoffs.

🎯 Objective

The goal of this project is to:

Clean and prepare the 2024 layoffs dataset for analysis.
Create meaningful visualizations to help stakeholders understand trends and patterns in mass layoffs.

🛠 Technologies Used

🐍Python (Pandas, NumPy)
🛢️SQL (SQLite)
📊Power BI (for visualizations)
📅Excel (for initial inspection and analysis)

📊 Dataset Description

The dataset contains information about layoffs from 2020 to 2024, scraped from Layoffs.fyi. The purpose of the dataset is to enable the analysis of recent mass layoffs and discover patterns in the layoffs across industries and countries.

Data Source: Layoffs.fyi
Credits: Roger Lee

🧹 Data Cleaning Process

The data cleaning process involved several key steps:

Removing Duplicates: Duplicate rows were removed to ensure data integrity.
Imputing Missing Values:
- Numeric columns: Missing values were replaced with the rounded mean of the respective columns.
- Categorical columns: Missing values were filled with the mode (most frequent value) of each column.
Normalizing Data Formats:
- Text columns were converted to lowercase for consistency.
- Date columns were standardized to the 'YYYY-MM-DD' format.
Data Type Standardization: Ensured that all columns had the correct data types for analysis.

📊 Power BI Visualizations

After cleaning the data, I used Power BI to create a set of insightful visualizations:

📈 Line Chart: Company Layoffs by Year and Industry
- This line chart visualizes the number of layoffs across the years (2020-2024) and the industries most affected by layoffs.
- It helps to identify trends in layoffs by industry and across different years.
🌍 Map Chart: Layoffs by Country
- A map chart displays the total layoffs by country, enabling a geographic view of mass layoffs.
- This helps identify regions with the highest concentration of layoffs and visualize global trends.
📊 Summary Card: Total Layoffs (1.03M)
- A card visual was used to display the total number of layoffs across all years and industries, which came to 1.03 million.
- This provides a quick summary of the data for stakeholders.
📊 Dashboard: Considated View
- A consolidated view of all three visualizations in the Power BI Dashboard. This dashboard combines the line chart 📈, map chart 🌍, and summary card 🗂 into a comprehensive, interactive interface for stakeholders to explore layoffs trends, geographic distribution, and overall totals.

📹 Demo Video

Layoffs Dataset 2024: Data Cleaning, Preparation, and Visualization - Watch Video

💻 Code Explanation

Below is a summary of the key Python code used for cleaning the data:

import pandas as pd
import numpy as np
import sqlite3

# Load the dataset
data = pd.read_csv(r'csv/layoffs_2024.csv')

# Initial data inspection
print("Initial data structure:")
print(data.info())

# Remove duplicate rows
data = data.drop_duplicates()

# Impute missing values
numeric_columns = data.select_dtypes(include=['float64', 'int64']).columns
for column in numeric_columns:
    mean_value = round(data[column].mean(), 2)
    data[column] = data[column].fillna(mean_value)

categorical_columns = data.select_dtypes(include=['object']).columns
for column in categorical_columns:
    mode_value = data[column].mode()[0]
    data[column] = data[column].fillna(mode_value)

# Normalize text data
for column in categorical_columns:
    data[column] = data[column].str.lower()

# Standardize date formats
if 'date' in data.columns:
    data['date'] = pd.to_datetime(data['date']).dt.strftime('%Y-%m-%d')

# Ensure correct data types
data['total_laid_off'] = pd.to_numeric(data['total_laid_off'], errors='coerce')
data['percentage_laid_off'] = pd.to_numeric(data['percentage_laid_off'], errors='coerce')
data['funds_raised'] = pd.to_numeric(data['funds_raised'], errors='coerce')

# Save cleaned data to CSV and SQL
cleaned_file_path = 'cleaned_layoffs_2024.csv'
data.to_csv(cleaned_file_path, index=False)

conn = sqlite3.connect('cleaned_layoffs.db')
data.to_sql('cleaned_layoffs', conn, if_exists='replace', index=False)
conn.close()

📝 Key Steps

Load Dataset: The dataset is loaded from a CSV file.
Remove Duplicates: Duplicate rows are dropped.
Impute Missing Values: Missing values in numeric and categorical columns are imputed using the mean and mode, respectively.
Normalize Data: Text data is converted to lowercase, and date columns are standardized.
Save the Cleaned Data: The cleaned data is saved in both CSV and SQLite formats.

📈 Results

Data Cleaned: The dataset was cleaned of duplicate entries, and missing values were imputed.
Visualizations Created: After cleaning the data, the following Power BI visualizations were created:
- Line Chart: Showed layoffs by year and industry from 2020 to 2024.
- Map Chart: Visualized the geographic distribution of layoffs by country.
- Summary Card: Displayed the total number of layoffs (1.03M).

🏃‍♀️ How to Run the Project

Clone or download the repository.
Make sure you have the required Python libraries installed:
```
pip install pandas numpy sqlite3
```
Place the layoffs_2024.csv file in the csv/ directory.
Run the Python script to clean the dataset:
```
python Data_Cleaning_and_Preparation.py
```
Import the cleaned dataset (cleaned_layoffs_2024.csv) into Power BI for visualization.

👏 Credits

Dataset: Layoffs.fyi
Author: Roger Lee
Project Developer: Willie Conway ✨

📄 License

This project is licensed under the MIT License - see the LICENSE file for details.

Name		Name	Last commit message	Last commit date
Latest commit History 14 Commits
Project		Project
LICENSE		LICENSE
README.md		README.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

🚀 Layoffs Dataset 2024: Data Cleaning, Preparation, and Visualization 📊

📖Table of Contents

Project Overview📁

🎯 Objective

🛠 Technologies Used

📊 Dataset Description

🧹 Data Cleaning Process

📊 Power BI Visualizations

📹 Demo Video

💻 Code Explanation

📝 Key Steps

📈 Results

🏃‍♀️ How to Run the Project

👏 Credits

📄 License

About

Uh oh!

Languages

License

Willie-Conway/Data-Cleaning-and-Preparation

Folders and files

Latest commit

History

Repository files navigation

🚀 Layoffs Dataset 2024: Data Cleaning, Preparation, and Visualization 📊

📖Table of Contents

Project Overview📁

🎯 Objective

🛠 Technologies Used

📊 Dataset Description

🧹 Data Cleaning Process

📊 Power BI Visualizations

📹 Demo Video

💻 Code Explanation

📝 Key Steps

📈 Results

🏃‍♀️ How to Run the Project

👏 Credits

📄 License

About

Topics

Resources

License

Uh oh!

Stars

Watchers

Forks

Languages