Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
24 views12 pages

Implementation (Raw)

Uploaded by

Saurabh Ghute
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
24 views12 pages

Implementation (Raw)

Uploaded by

Saurabh Ghute
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as PDF, TXT or read online on Scribd
You are on page 1/ 12

CHAPTER 6

IMPLEMENTED WORK

6. Purpose
The purpose of the Predictive Model for Retail Sales project is to develop an intelligent
system that can forecast future sales for retail products using machine learning techniques.
By analyzing historical sales data, the model enables retailers to make data-driven decisions
regarding inventory management, marketing strategies, and resource allocation. This project
aims to enhance business efficiency by providing accurate predictions of product demand,
helping retailers optimize their operations and improve profitability. The deployed system
offers an easy-to-use interface for users to input product details and receive sales predictions.

Plan of Implementation

Implementation is the stage in the project where the theoretical design is turned into a working
system. The implementation phase constructs, installs and operates the new system. The most
crucial stage in achieving a new successful system is that it will work efficiently and
effectively.

There are several activities involved while implementing a new project. They are as follow

• Research existing the structure of project.

• Studying programming skills.

• Coding.

• Implementation of the proposed code.

• Testing and De-bugging.

• Finalizing the project report.

The Predictive Model for Retail Sales project is structured into several key phases, each of
which contributes to the development of a machine learning-based predictive model. The
project begins with data preprocessing, where raw sales data is cleaned and prepared for

40
analysis. After preprocessing, the data is fed into a Random Forest model to train and generate
predictions.

The steps involved are as follows:

Data Preprocessing: Handle missing values, feature extraction, and normalization.

Model Training: Using a Random Forest model to learn from historical sales data.

Model Evaluation: Metrics such as Mean Squared Error (MSE) and R-Squared are used to
evaluate the model's performance.

Deployment: A web application built with Streamlit allows users to input new data and get
sales predictions.

The final deliverable is an accessible tool for retail managers and business analysts to predict
future sales, helping them make strategic decisions.

6.2. Dataset Description

6.2.1 Source of Data

The dataset used in this project is derived from retail sales data, which contains multiple
features that impact sales. The dataset includes the following key columns:

Date: The date of sales recorded.


Store: The store or outlet identifier.
Item: The identifier for the product.
Sales: The sales figure for that product on that date.
The dataset comprises tens of thousands of records covering several stores and items across
various dates. This allows the model to generalize across multiple products and outlets.
6.2.2Dataset Sample
Fig: Here is a screenshot sample of the data from the train.csv file:

40
6.3Data Preprocessing
6.3.1 Loading Data
The data is loaded using the pandas library, which allows for easy manipulation and analysis.
The following code snippet shows how the data is loaded from the CSV file:

import pandas as pd
# Load the dataset
data = pd.read_csv('train.csv')
# Display the first few rows of the data
data.head()
6.2.2Handling Missing Values
In this project, missing values in the date column were handled by removing rows where the
date was not present:

# Converting the 'date' column to datetime format and handling missing values
data['date'] = pd.to_datetime(data['date'], format='%d/%m/%Y', errors='coerce')

40
# Dropping rows with missing date values
data.dropna(subset=['date'], inplace=True)
By removing missing values, we ensure that the model works with clean data, preventing
errors during training.
6.3.3 Feature Engineering
One of the key preprocessing steps is converting the date column into meaningful features,
such as the day, month, and year of the sale:

# Extracting features from the date column


data['year'] = data['date'].dt.year
data['month'] = data['date'].dt.month
data['day'] = data['date'].dt.day
These additional features allow the model to capture temporal patterns in sales, such as
seasonal trends or day-of-the-week effects.
1. Correlation Heatmap
plt.figure(figsize=(10, 8))
correlation_matrix = data.corr()
sns.heatmap(correlation_matrix, annot=True, cmap="coolwarm")
plt.title("Correlation Heatmap")
plt.show()

40
2.Sales Distribution
plt.figure(figsize=(8, 6))
sns.histplot(data['sales'], bins=20, kde=True)
plt.title("Distribution of Sales")
plt.xlabel("Sales")
plt.ylabel("Frequency")
plt.show()

40
6.3 Model Selection
In this project, various machine learning algorithms were considered for building a robust
sales prediction model. After evaluating multiple models, the Random Forest Regressor
was selected for its ability to handle both linear and non-linear data. Random Forests are
powerful because they combine multiple decision trees to provide more accurate predictions.
6.4.1 Random Forest Overview
Random Forest is an ensemble learning method that operates by constructing multiple
decision trees during training. It outputs the average of predictions from individual trees,
reducing the chances of overfitting and improving generalization on unseen data.
6.4.2 Model Implementation
The RandomForestRegressor from the sklearn.ensemble module was used for this project.
Here's the code to set up and train the model:

40
from sklearn.ensemble import RandomForestRegressor
from sklearn.model_selection import train_test_split
# Define features (X) and target (y)
X = data[['store', 'item', 'year', 'month', 'day']]
y = data['sales']
# Split the dataset into training and testing sets
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=42)
# Initialize the Random Forest Regressor
rf_model = RandomForestRegressor(n_estimators=100, random_state=42)
# Train the model
rf_model.fit(X_train, y_train)
The features chosen for the model include the store and item identifiers, along with the
extracted year, month, and day from the date column.
6.4.3 Model Saving
After training, the model is saved using the pickle module so that it can be loaded for future
predictions without retraining:

import pickle
# Save the trained model to a file
with open('rf_model.pkl', 'wb') as model_file:
pickle.dump(rf_model, model_file)
Saving the model is essential for deploying it in a production environment.
6.5 Evaluation Metrics
After training the model, several metrics were used to evaluate its performance. These metrics
provide insight into how well the model predicts sales on unseen data.
6.5.1 Evaluation Metrics Overview
The following metrics were chosen to evaluate the model:
• Mean Squared Error (MSE): Measures the average squared difference between the
predicted and actual values.
• R-Squared (R²): Represents the proportion of variance in the target variable that can
be explained by the features.

40
• Mean Absolute Error (MAE): The average of absolute differences between the
predicted and actual values.
6.5.2 Model Evaluation Code
The following code was used to calculate these metrics on the test data:
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error

# Make predictions on the test set


y_pred = rf_model.predict(X_test)

# Calculate evaluation metrics


mse = mean_squared_error(y_test, y_pred)
r2 = r2_score(y_test, y_pred)
mae = mean_absolute_error(y_test, y_pred)

# Print the results


print(f"Mean Squared Error: {mse}")
print(f"R-Squared: {r2}")
print(f"Mean Absolute Error: {mae}")

6.5.3 Results
After evaluating the model on the test set, the following results were obtained:
• Mean Squared Error: [Insert MSE value here]
• R-Squared: [Insert R² value here]
• Mean Absolute Error: [Insert MAE value here]
These metrics indicate how well the model generalizes to new, unseen data.
6.5.4 Additional Visualizations
Snippt for Actual vs. Predicted Sales
plt.figure(figsize=(8, 6))
plt.scatter(y_test, y_pred)
plt.xlabel("Actual Sales")
plt.ylabel("Predicted Sales")
plt.title("Actual vs. Predicted Sales")

40
plt.show()

# 4. Residual Plot
plt.figure(figsize=(8, 6))
sns.residplot(x=y_test, y=y_pred)
plt.xlabel("Actual Sales")
plt.ylabel("Residuals")
plt.title("Residual Plot")
plt.show()

40
6.6. Deployment
The project was deployed as a web application using Streamlit and Flask to create an
interactive user interface. The app allows users to input product and date details to predict
future sales.
6.6.1 Streamlit App Overview
The application is built using the sales_app.py script, which loads the saved model and
provides an interface for predicting sales.
Here’s the simplified code for loading the model and making predictions:
import pandas as pd
from sklearn.model_selection import train_test_split
from sklearn.ensemble import RandomForestRegressor
from sklearn.metrics import mean_squared_error, r2_score, mean_absolute_error,
median_absolute_error, explained_variance_score

40
from sklearn.preprocessing import StandardScaler
import pickle
import streamlit as st
from datetime import datetime, timedelta
# Load the trained model
with open('rf_model.pkl', 'rb') as model_file:
rf_model = pickle.load(model_file)

# User input for making predictions


store = st.number_input("Enter Store ID:")
item = st.number_input("Enter Item ID:")
year = st.number_input("Enter Year:")
month = st.number_input("Enter Month:")
day = st.number_input("Enter Day:")

# Predict button
if st.button("Predict Sales"):
prediction = rf_model.predict([[store, item, year, month, day]])
st.write(f"Predicted Sales: {prediction[0]}")
Fig:Sample Screenshot of Streamlit app file sales_app.py

40
40

You might also like