A Jupyter Notebook that trains an XGBoost regression model to estimate median house values from the California Housing dataset.
- Load the dataset from scikit-learn
- Explore feature distributions and correlations
- Split the data into training and test sets
- Train an
XGBRegressor - Evaluate predictions with R-squared and mean absolute error
- Compare predicted and actual values visually
The California Housing dataset contains eight numerical features:
MedInc HouseAge AveRooms AveBedrms Population AveOccup Latitude Longitude
The target is the median house value for each California district.
pip install pandas matplotlib seaborn scikit-learn xgboost jupyter
jupyter notebook PROJECT3_HOUSEPRIZE.ipynbRun the notebook cells in order. The dataset is downloaded through scikit-learn.
Python pandas scikit-learn XGBoost Matplotlib Seaborn