- The Boston housing market is highly competitive, and the goal is to be one of the best real estate agents in the area. To compete in the real estate market, a few basic machine learning concepts are applied in order to assist a client with finding the best selling price for their home. The Boston Housing dataset which contains aggregated data on various features for houses in Greater Boston communities, including the median value of homes for each of those areas. The objective is to build an optimal model based on a statistical analysis with the tools available and then utilize this model to estimate the best selling price for the client's home. Additional information on the Boston Housing dataset can be found
here. The source code developed for this project can be found in the ipython notebook:boston_housing.ipynb
- Python 2.7
- NumPy
- scikit-learn
- iPython Notebook
- The following code performs several statistical calculations on the dataset.
def explore_city_data(city_data):
"""Calculate the Boston housing statistics."""
number_houses = housing_features.shape[0] # size of data
number_features = housing_features.shape[1] # number of features
min_price = np.min(housing_prices) # minimum price
max_price = np.max(housing_prices) # maximum price
mean_price = np.mean(housing_prices) # mean price
median_price = np.median(housing_prices) # median price
std_price = np.std(housing_prices) # standard deviation| Statistics | Value |
|---|---|
| Number of Houses | 506 |
| Number of Features | 13 |
| Min Price | $5,000 |
| Max Price | $50,000 |
| Mean Price | $22,533 |
| Median Price | $21,200 |
| Standard Deviaton | $9,188 |
- Once the model has been trained and tested using the methods discussed above, we can now predict the housing price on the out-of-sample data provided by the client as shown in the table below. Statistical analysis performed on the dataset indicates the median home value is
$21,200and the standard deviation is$9,190. Using the out-of-sample data, the model is predicting the average price of the home to be$21,629. Being the predicted value falls within one standard deviation of the mean, therefore the predication is considered to be reasonable. The source code can be found inboston_housing.ipynbalong with further analysis.
| Features | Sample |
|---|---|
| CRIM | 11.95 |
| ZN | 0.00 |
| INDUS | 18.10 |
| CHAS | 0.00 |
| NOX | 0.6590 |
| RM | 5.6090 |
| AGE | 90.00 |
| DIS | 1.385 |
| RAD | 24.0 |
| TAX | 680.0 |
| PTRATIO | 20.20 |
| B | 332.09 |
| LSTAT | 12.13 |
- CRIM per capita crime rate by town - ZN proportion of residential land zoned for lots over 25,000 sq.ft. - INDUS proportion of non-retail business acres per town - CHAS Charles River dummy variable (=1 if tract bounds river;0 otherwise) - NOX nitric oxides concentration (parts per 10 million) - RM average number of rooms per dwelling - AGE proportion of owner-occupied units built prior to 1940 - DIS weighted distances to five Boston employment centres - RAD index of accessibility to radial highways - TAX full-value property-tax rate per $10,000 - PTRATIO pupil-teacher ratio by town - B 1000(Bk - 0.63)^2 where Bk is the proportion of blacks by town - LSTAT % lower status of the population - MEDV Median value of owner-occupied homes in $1000's