This repository contains a Jupyter Notebook project focused on data wrangling, exploratory data analysis (EDA), and insights generation from shark attack incidents around the world. The notebook titled 2025-04-21-Data-Wrangling-Sharks.ipynb presents the workflow and findings in a structured, reproducible manner.
The objective of this project is to clean, standardize, and explore a dataset detailing global shark attacks. The analysis focuses on extracting meaningful patterns, addressing missing data, and visualizing trends over time, location, and victim attributes.
This project is part of Unit 2: Data Wrangling & Retrieval, aimed at building foundational skills in working with messy real-world data using Python and pandas.
2025-04-21-Data-Wrangling-Sharks.ipynb: The main notebook containing the data wrangling pipeline and EDA.README.md: Project documentation and summary.GSAF5.xls: The original dataset if included separately.
- Python 3.x
- Jupyter Notebook
- pandas
- numpy
- matplotlib
- seaborn
- Data Cleaning: Handling of null values, inconsistent formatting, and irrelevant columns.
- Feature Engineering: Creation of clean and consistent columns such as
Year,Country,Activity,Fatal (Y/N), andGender. - Exploratory Data Analysis (EDA): Visualizations highlighting patterns in shark attacks by year, country, activity type, and fatality.
- Insights:
- Temporal trends in shark attacks
- High-risk locations and activities
- Demographic breakdowns (age, gender, etc.)
- Bar charts of shark attacks by country
- Time series of attacks over decades
- Pie charts showing fatal vs non-fatal incidents
- Clone the repository:
git clone https://github.com/yourusername/shark-attacks-analysis.git cd shark-attacks-analysis
https://docs.google.com/presentation/d/1SlHyIXtr7roXPl2liGAZ02YSIVLcMRyRWXQoHOD3puo/edit?usp=sharing