This project analyzes 98 years of Academy Awards (Oscars) data to uncover trends, biases, and patterns in the film industry. It includes:
- Data Cleaning: Handling missing values, duplicates, and inconsistencies.
- Exploratory Data Analysis (EDA): Visualizations to explore trends in categories, years, and films.
- Machine Learning: Predicting award winners based on nominations, categories, and release years.
Perfect for data analysis portfolios or showcasing skills in Python, Pandas, and Scikit-learn.
oscar-analysis/
├── data/
│ ├── the_oscar_award.csv # Raw dataset from Kaggle
│ └── oscar_cleaned.csv # Cleaned dataset (auto-generated)
├── notebooks/
│ └── oscar_analysis.ipynb # Jupyter Notebook with full analysis
└── README.md # This file
- Source: Kaggle - The Oscar Award (1927–2025)
- Key Columns:
year_film: Year the film was released.year_ceremony: Year the ceremony was held.category: Award category (e.g., Best Picture, Best Actor).film: Title of the film.winner: Whether the nominee won (1 = Yes, 0 = No).
| Tool | Purpose |
|---|---|
| Python 3.10+ | Programming language |
| Pandas | Data cleaning and manipulation |
| NumPy | Numerical operations |
| Matplotlib | Static visualizations |
| Seaborn | Statistical visualizations |
| Scikit-learn | Machine learning models |
| Jupyter | Interactive notebook environment |
- Install Anaconda.
- Open Anaconda Prompt and run:
conda create -n oscar_env python=3.10 -y
conda activate oscar_env
conda install numpy=1.23.5 pandas=1.5.3 matplotlib seaborn scikit-learn jupyter -y- Install Python 3.10+ from python.org.
- Open a terminal and run:
pip install numpy==1.23.5 pandas==1.5.3 matplotlib seaborn scikit-learn jupyter- Download
the_oscar_award.csvfrom Kaggle. - Save it in the
data/folder of your project.
- Navigate to your project directory:
cd path/to/oscar-analysis- Launch Jupyter Notebook:
jupyter notebook- Open
notebooks/oscar_analysis.ipynband run the cells in order.
Here’s what the analysis reveals:
- Most Awarded Categories:
- Best Picture, Best Actor, and Best Actress dominate the awards.
- Trends Over Time:
- The number of awards per year has grown, reflecting the expansion of the film industry.
- Winner Distribution:
- Only a small percentage of nominees win, highlighting the competitiveness of the Oscars.
- Predictive Model:
- A Random Forest Classifier predicts winners with ~68% accuracy (replace with your actual result) based on nominations, category, and release year.
| File/Folder | Description |
|---|---|
data/ |
Contains raw and cleaned datasets. |
notebooks/ |
Jupyter Notebook with all analysis code. |
README.md |
Project documentation (this file). |
| Issue | Solution |
|---|---|
| FileNotFoundError | Ensure the_oscar_award.csv is in the data/ folder. |
| Binary Incompatibility | Use compatible versions: numpy=1.23.5, pandas=1.5.3. |
| Missing Columns | Update the notebook to use the correct column names (e.g., year_film instead of year). |
| Jupyter Notebook Won’t Open | Ensure Jupyter is installed (pip install jupyter or conda install jupyter). |
This project is open-source under the MIT License.
- Data provided by Kaggle.
- Inspired by the global community of data analysts and film enthusiasts.