VISVESVARAYA TECHNOLOGICAL UNIVERSITY
Jnana Sangama, Belgaum-590018
MINI PROJECT REPORT ON
Smart Energy Consumption Forecasting using
machine learning techniques
SUBMITTED BY
Ankit Mathpati (3GN22AI006)
Shivani Reddy (3GN22AI049)
Mohammed Maazuddin (3GN22AI029)
UNDER THE GUIDANCE OF
DR. Harish Joshi
BACHELOR OF ENGINEERING IN ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING
DEPARTMENT
GURU NANAK DEV ENGINEERING COLLEGE, MAILOOR ROAD,
BIDAR,KARNATAKA-585403
1|Page
VISVESVARAYA TECHNOLOGICAL UNIVERSITY, BELAGAVI
GURU NANAK DEV ENGINEERING COLLEGE
BIDAR-585403, KARNATAKA
DEPARTMENT OF ARTIFICIAL INTELLIGENCE AND
MACHINE LEARNING
CERTIFICATE
This is to certify that the project work entitled “Smart Energy Consumption Forecasting
using machine learning techniques” is a bonafide work carried out by ANKIT
MATHPATI(3GN22AI006), SHIVANI REDDY(3GN22AI049), MOHAMMED
MAAZUDDIN(3GN22AI029) in practical fulfillment for the award of Bachelor of
Engineering in ARTIFICIAL INTELLIGENCE AND MACHINE LEARNING of the
Visvesvaraya Technological University, Belagavi during the academic year 2024-2025. It is
certified that all the corrections/suggestions indicated for the internal assessment have been
incorporated in the report deposited in the departmental library. The project report has been
approved as it satisfies the academic requirements in respect of the project work prescribed for
the Bachelor of Engineering Degree.
Signature of Guide Signature of HOD Signature of Principle
Dr. Harish Joshi Dr. Dayanand J. Dr. Dhananjay M
Professor,Head Professor, Head Principal,GNDEC
Dept.of CSE(ICB), Dept.of AIML, Bidar,Karnataka
GNDEC Bidar GNDEC Bidar
2|Page
ACKNOWLEDGEMENT
The mini project report on “Smart Energy Consumption Forecasting using machine learning
techniques” is the outcome of guidance, moral support and devotion bestowed on us throughout our
work. For this we acknowledge and express our profound sense of gratitude and thanks to everybody
who have been a source of inspiration during the project work. First and foremost, we offer our sincere
phrases of thanks with innate humility to our PRINCIPAL DR.DHANANJAY M who has been a
constant source of support and encouragement. We feel deeply indebted to our H.O.D. DR.
DAYANAND J, for help provided from the time of inception till date. I would take this opportunity to
acknowledge our GUIDE DR.HARISH JOSHI, who not only stood by us as a source of inspiration,
but also dedicated his time to enable us to present the mini project on time. We would be failing in
Endeavour, if we do not thank our parents who have helped us in every aspect of our life.
ANKIT MATHPATI (3GN22AI006)
SHIVANI REDDY (3GN22AI049)
MOHAMMED MAAZUDDIN (3GN22AI029)
3|Page
DECLARATION
We ANKIT MATHPATI , SHIVANI REDDY, MOHAMMED MAAZUDDIN hereby
declare that, this dissertation work entitled “Smart Energy Consumption Forecasting using
machine learning techniques ” was independently carried out by us under the guidance and
assistance of DR.HARISH JOSHI Department of CSE (IOT & Cyber Security including Block
Chain Technology) , Guru Nanak Dev Engineering College, Bidar, Karnataka.
This dissertation work is submitted to VISVESVARAYA TECHNOLOGICAL
UNIVERSITY, BELAGAVI in partial fulfillment of the requirements for the award of Degree
of Bachelor of Engineering in DEPARTMENT OF ARTIFICIAL INTELLIGENT AND
MACHINE LEARNING during the academic year 2024-2025.
Place : Bidar
Date : 16-05-2025
ANKIT MATHPATI (3GN22AI006)
SHIVANI REDDY (3GN22AI049)
MOHAMMED MAAZUDDIN (3GN22AI029)
4|Page
TABLE OF CONTENT
Chapter no Content Page no
1 Abstract 7
2 Introduction 8
3 Objectives 9
4 Problem Statement 10
5 Methodology 10 - 11
6 Tools and Technologies 12 - 13
used
7 Dataset Description 13 - 14
8 Model Explanation 15 - 16
9 Code Implementation 16 – 19
10 Conclusion 20
5|Page
1. ABSTRACT
Predicting energy consumption is vital in today’s fast-paced and energy-dependent world. With rapid
urbanization and increasing reliance on electricity, accurate forecasting models can significantly improve
energy distribution, reduce operational costs, and contribute to environmental sustainability. This project
implements a machine learning solution using a linear regression model to predict the next hour's energy
consumption based on historical data. The methodology revolves around processing time-series data,
generating lag-based features, training a model, and deploying it using a Flask-based web application.
The core idea is straightforward yet powerful: by analyzing the energy usage of the previous hour, we can
estimate the consumption for the next. This approach is grounded in the fact that short-term energy usage is
often influenced by recent patterns due to user behavior, industrial cycles, and weather conditions. This
model focuses on a univariate time-series strategy, meaning it relies solely on one key feature — the
previous hour’s consumption — to make predictions.
The project lifecycle includes data cleaning, transformation, feature engineering, model training,
performance evaluation, and front-end integration. The dataset, which includes detailed electricity
consumption records, is preprocessed to ensure consistency and reliability. Once trained, the model is
persisted using the joblib library and is seamlessly integrated with a web application. Users can interact with
the model in real time by inputting energy usage data and receiving predictive feedback instantly.
This system’s practical application extends to residential, commercial, and industrial contexts, offering a
scalable and lightweight solution for energy management. While it uses a basic model for demonstration,
the project lays a foundation for more complex predictive frameworks using deep learning or multivariate
inputs in future iterations. Overall, the combination of predictive analytics and a user-friendly interface
encapsulates the goal of turning data into actionable insight.
6|Page
2. INTRODUCTION
In the era of data-driven decision-making, predictive analytics has emerged as a cornerstone for numerous
industries, especially in energy management. As populations grow and technology becomes more ingrained
in daily life, the global demand for electricity continues to surge. This growing demand places immense
pressure on power grids and energy providers to forecast and manage consumption efficiently. Predicting
future energy usage helps prevent overloading systems, optimize resource allocation, and reduce costs
associated with energy production and wastage.
This project revolves around building a machine learning model capable of forecasting hourly energy
consumption. Using historical data patterns, particularly the consumption in the preceding hour, a simple
linear regression model is trained to anticipate the energy usage of the following hour. While this might
seem basic, it captures the temporal correlation in energy use effectively, especially in stable environments
where usage patterns do not fluctuate unpredictably.
Furthermore, the integration of this model into a web application reflects the modern shift toward real-time,
user-interactive systems. Rather than merely generating predictions on a backend server, the model is made
accessible through a user interface that takes input and displays predictions on-the-fly. This approach is
aligned with current trends in smart homes and industries where energy dashboards are becoming
increasingly popular.
The choice of linear regression, despite its simplicity, is deliberate — it provides an interpretable,
computationally inexpensive, and fast-performing model. This is especially beneficial in early-stage
projects, academic environments, or lightweight applications where deep learning might be unnecessary or
overkill.
This project does not aim to produce the most complex model, but rather to demonstrate the complete
workflow of a predictive system: from raw data ingestion to live deployment. In doing so, it provides a solid
foundation for future enhancements such as multi-feature models, advanced time series methods, or the
inclusion of seasonal and trend components.
7|Page
3. OBJECTIVES
The primary objective of this project is to forecast hourly electricity consumption using machine
learning and to make this predictive capability accessible via a web-based interface. However,
to reach this outcome, several sub-objectives must be fulfilled. Each of these contributes to the
success and completeness of the solution and reflects a full-cycle data science and application
development workflow.
Objective 1: Data Acquisition and Preprocessing
First, we aim to read and process the energy consumption dataset. Real-world data often
contains missing, inconsistent, or malformed entries. Our goal is to clean this data, remove null
values, and transform it into a format suitable for machine learning, specifically time series
forecasting.
Objective 2: Feature Engineering
We aim to convert raw data into features that help the model learn effectively. By resampling
energy data into hourly averages and creating lag features like Prev_Hour, we provide the
model with contextual information — a fundamental aspect of temporal prediction.
Objective 3: Model Training and Evaluation
Develop a machine learning model — in this case, linear regression — that can learn from the
training data and make accurate predictions. This includes splitting the dataset correctly
(considering the time-series nature), fitting the model, and evaluating its performance using
metrics such as Mean Squared Error (MSE).
Objective 4: Model Persistence
We want to ensure the trained model is saved and reusable without retraining it every time the
server starts. This is accomplished through model serialization using joblib, which enables quick
loading and prediction during web requests.
Objective 5: Web Application Development
A user interface is developed using Flask, which allows users to input energy consumption from
the previous hour and obtain a predicted value for the next hour. The backend seamlessly
interacts with the saved model and delivers real-time results.
Objective 6: Deployment-Ready Architecture
Ensure the overall application structure is modular and scalable, so that it can be extended with
more features or integrated into larger energy monitoring systems.
These objectives align well with the modern practice of building full-stack machine learning
applications that deliver real-time insights.
8|Page
4. Problem Statement
In modern energy systems, the accurate prediction of electricity consumption is a key challenge faced by
utility providers, industries, and consumers alike. The complexity arises due to the dynamic nature of energy
usage, which can vary significantly based on factors such as time of day, weather, season, and user
behavior. The overarching problem is how to efficiently and accurately forecast short-term energy
consumption using available historical data.
In this project, the specific challenge is to predict the energy usage for the next hour based solely on the
previous hour's data. While the scope is narrow — relying on just a single lag feature — it simplifies the
problem for an initial implementation and provides a baseline for future model improvement. By focusing
on a univariate time-series model, we test whether immediate past behavior is a strong enough indicator of
near-future energy demand.
From a technical standpoint, the challenge includes preprocessing noisy and incomplete real-world data,
handling time-series datetime formats, resampling data at hourly intervals, and developing a model that can
generalize over unseen data. Additionally, ensuring that the model can be reliably integrated into a live web
application introduces another layer of complexity, especially in terms of robustness and performance under
real-time constraints.
Furthermore, another problem addressed by this project is accessibility. Many energy analytics tools are
confined to backend systems or complex platforms not usable by non-technical stakeholders. The goal here
is to democratize access to predictive analytics through a lightweight web app where any user can input data
and receive a prediction with minimal effort.
Ultimately, the problem is twofold: developing a working predictive model using limited features and
ensuring it can be deployed in a user-friendly, real-time interface. Solving this dual problem helps lay the
groundwork for more sophisticated, interactive forecasting systems applicable in homes, businesses, and
smart grid infrastructures.
5. Methodology
The methodology adopted in this project encompasses the entire machine learning lifecycle — from raw
data processing to the deployment of a trained model within a web application. Each step is carefully
designed to transform unstructured time-series data into actionable insights through predictive modeling.
Below is an in-depth explanation of each phase:
1. Data Loading and Cleaning
9|Page
The dataset is loaded from a text file using pandas. Given the format and nature of the data (with “;”
separators and occasional missing values), preprocessing is critical. We convert invalid entries marked with
"?" into NaNs and subsequently remove rows containing missing values to maintain data integrity.
2. Datetime Transformation
The dataset contains separate columns for Date and Time. These are concatenated and converted into a
Datetime object, which becomes the index of the dataframe. This step is essential for time-series analysis,
allowing for accurate resampling and temporal operations.
3. Resampling and Feature Engineering
To simplify modeling and capture meaningful temporal patterns, the data is resampled into hourly averages.
This reduces noise and makes trends more apparent. We create a new feature Prev_Hour by lagging the
target column (Global_active_power) by one time step. This single-feature setup allows us to implement a
univariate time-series prediction model.
4. Data Splitting
Due to the sequential nature of time-series data, we split the dataset without shuffling. The first 80% is used
for training, and the remaining 20% is reserved for testing. This approach ensures temporal integrity and
simulates how a real-world predictive system would behave — training on past data and forecasting future
values.
5. Model Training and Evaluation
A simple linear regression model from scikit-learn is used. Despite its simplicity, it is an effective
baseline, especially for short-term predictions. The model is trained to predict Global_active_power based
on Prev_Hour, and its performance is assessed using Mean Squared Error (MSE).
6. Model Serialization
Once trained, the model is saved using joblib, which enables fast reloading for use in the web application
without retraining.
7. Web Integration
Using Flask, we expose a /predict endpoint that accepts a POST request with prev_hour as input. It loads
the saved model and returns the predicted energy consumption for the next hour.
10 | P a g e
This systematic methodology ensures that each part of the workflow is reproducible, efficient, and
extendable for future improvements or deployments in real-world scenarios.
6. TOOLS and TECHNOLOGIES Used
This project leverages a set of powerful, widely-used tools and libraries in the data science and web development
ecosystem. The selected technologies are chosen for their simplicity, flexibility, and ease of integration, making the
overall system robust yet lightweight. Here is a breakdown of each tool and its purpose in the project:
1. Python
Python serves as the backbone of the project. Its simplicity, readability, and vast ecosystem of libraries make it the
preferred language for both machine learning and web development tasks.
2. Pandas
Pandas is used extensively for data manipulation and preprocessing. It helps in reading the dataset, handling missing
values, creating datetime objects, resampling time-series data, and preparing features for model training.
3. NumPy
NumPy supports numerical operations and provides efficient array handling, which is essential when dealing with
large datasets or performing mathematical transformations.
4. Scikit-learn
This library is central to the machine learning aspect of the project. Scikit-learn provides the LinearRegression
model, functions for splitting the data (train_test_split), and evaluation metrics such as
mean_squared_error. Its consistent and user-friendly API makes it easy to train, evaluate, and deploy models.
5. Joblib
Joblib is used for model serialization. After training, the model is saved as a .pkl file, which can later be loaded
instantly for inference. This eliminates the need to retrain the model every time the application is restarted.
6. Flask
Flask is a lightweight Python web framework used to develop the front-end interface and API endpoints. It enables
users to interact with the machine learning model through a web form and receive real-time predictions via HTTP
requests.
7. HTML/CSS (Implied)
Though not included in the provided files, it is implied that the Flask application uses HTML templates to render the
11 | P a g e
user interface, allowing users to input data and view the results in a browser.
8. Requirements File
The requirements.txt lists all necessary libraries, ensuring that the environment can be replicated easily.
This file is essential for deployment and reproducibility.
nginx
CopyEdit
flask
pandas
scikit-learn
joblib
Each of these tools contributes a vital piece to the puzzle, working in harmony to create a reliable, efficient,
and interactive predictive system.
7. Dataset Description
The core of any machine learning project is the dataset it uses, and in this case, we use a real-world dataset
for energy consumption. The file name referenced in the code is energy_con_dataset.txt, which contains
power usage metrics collected over time. While the complete dataset file was not uploaded, the structure and
processing steps give us strong insight into its contents and format.
Dataset Characteristics:
File Format: Text file with ; as a separator (semi-colon delimited).
Primary Columns:
o Date – The day of the observation (format: DD/MM/YYYY).
o Time – The time of the observation (format: HH:MM:SS).
o Global_active_power – The total active power consumed (in kilowatts), used as the
primary target for prediction.
Additional columns may exist (e.g., Voltage, Global_intensity, Sub_metering_1/2/3), but the current
project utilizes only Global_active_power.
Data Quality:
12 | P a g e
The dataset contains missing or invalid entries, marked with ?. These are replaced with NaN during loading
using the na_values parameter in pandas, and rows containing NaN are dropped to maintain model
accuracy.
Datetime Conversion and Indexing:
The Date and Time columns are merged and converted into a datetime object using:python
CopyEdit
df['Datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'], format='%d/%m/%Y %H:%M:
%S')
This new Datetime column is set as the index of the dataframe, enabling time-based operations like
resampling.
Resampling and Transformation:
Since the raw data likely records observations every minute or second, the project resamples the data into
hourly intervals. This is achieved using:
python
CopyEdit
df_hourly = df['Global_active_power'].resample('h').mean().ffill()
This converts the original high-frequency data into more manageable and interpretable hourly averages,
which reduces noise and variability while preserving core usage trends.
Lag Feature:
To convert the dataset into a supervised learning problem, a new feature Prev_Hour is introduced. This is
simply the Global_active_power value from the previous hour, shifted forward by one time step:
python
CopyEdit
df_hourly['Prev_Hour'] = df_hourly['Global_active_power'].shift(1)
This creates a one-step lagged feature that serves as the model input (X), while the original power usage
becomes the target (y).
Final Structure:
The final dataset used for modeling consists of two columns:
Prev_Hour – Feature
Global_active_power – Target
This dataset structure is clean, time-indexed, and ready for model training, making it an ideal setup for time-
series prediction using linear regression.
13 | P a g e
8. Model Explanation
The core prediction engine of this project is a Linear Regression model, implemented using scikit-
learn. Linear regression is one of the simplest and most interpretable machine learning models. It
establishes a relationship between a dependent variable (target) and one or more independent variables
(features) by fitting a linear equation to the observed data.
Model Type:
Linear Regression (Univariate Time-Series)
Input Feature:
Prev_Hour: This feature represents the energy consumption recorded in the previous hour.
Target Variable:
Global_active_power: The energy consumption that we aim to predict for the current hour.
The idea is simple: the energy usage in one hour is likely influenced by the usage in the previous hour. This
is especially true in systems with repetitive behavior (e.g., home appliances, industrial processes, or habitual
routines).
Why Linear Regression?
1. Interpretability: It is easy to understand and visualize.
2. Performance: Suitable for linear or near-linear relationships.
3. Speed: Fast to train, making it ideal for rapid prototyping.
4. Resource Efficiency: Low memory and CPU requirements.
Training Process:
After creating the lagged feature (Prev_Hour), the data is split into training and testing sets using:
python
CopyEdit
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, shuffle=False)
The absence of shuffling is crucial, as time-series data must respect the chronological order.
The model is trained using:
14 | P a g e
python
CopyEdit
model = LinearRegression()
model.fit(X_train, y_train)
Model Serialization:
After training, the model is saved to disk using:
python
CopyEdit
joblib.dump(model, 'model/energy_model.pkl')
9. Code Implementation
1. train_model.py — Model Training Script
python
CopyEdit
import pandas as pd
import numpy as np
from sklearn.linear_model import LinearRegression
from sklearn.model_selection import train_test_split
from sklearn.metrics import mean_squared_error
import joblib
import os
# Load and clean the dataset
def load_and_prepare_data(file_path):
df = pd.read_csv(file_path, sep=';', low_memory=False, na_values='?')
df.dropna(inplace=True)
# Create datetime index
df['Datetime'] = pd.to_datetime(df['Date'] + ' ' + df['Time'], format='%d/%m/%Y %H:
%M:%S')
df.set_index('Datetime', inplace=True)
# Convert to float
df['Global_active_power'] = df['Global_active_power'].astype(float)
# Resample to hourly data
df_hourly = df['Global_active_power'].resample('H').mean().ffill()
df_hourly = pd.DataFrame(df_hourly)
# Create lag feature
df_hourly['Prev_Hour'] = df_hourly['Global_active_power'].shift(1)
15 | P a g e
df_hourly.dropna(inplace=True)
return df_hourly[['Prev_Hour']], df_hourly['Global_active_power']
# Train and save the model
def train_model():
X, y = load_and_prepare_data('energy_con_dataset.txt')
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2,
shuffle=False)
model = LinearRegression()
model.fit(X_train, y_train)
predictions = model.predict(X_test)
mse = mean_squared_error(y_test, predictions)
print(f"Model trained with MSE: {mse:.4f}")
os.makedirs('model', exist_ok=True)
joblib.dump(model, 'model/energy_model.pkl')
print("Model saved to model/energy_model.pkl")
if __name__ == "__main__":
train_model()
2. app.py — Flask Web Application
python
CopyEdit
from flask import Flask, render_template, request, jsonify
import joblib
app = Flask(__name__)
@app.route('/')
def home():
return render_template('index.html')
@app.route('/predict', methods=['POST'])
def predict():
try:
prev_hour = float(request.form['prev_hour'])
model = joblib.load('model/energy_model.pkl')
prediction = model.predict([[prev_hour]])[0]
return jsonify({'prediction': round(prediction, 3)})
except Exception as e:
return jsonify({'error': str(e)})
if __name__ == '__main__':
app.run(debug=True)
3. templates/index.html — Web Interface
html
CopyEdit
16 | P a g e
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<title>Energy Consumption Prediction</title>
</head>
<body>
<h2>Predict Next Hour Energy Consumption</h2>
<form action="/predict" method="post">
<label for="prev_hour">Enter Previous Hour Consumption (kW):</label>
<input type="text" id="prev_hour" name="prev_hour" required>
<input type="submit" value="Predict">
</form>
</body>
</html>
4. requirements.txt — Dependencies
nginx
CopyEdit
flask
pandas
numpy
scikit-learn
joblib
5. Folder Structure
pgsql
CopyEdit
energy-predictor/
├── model/
│ └── energy_model.pkl
├── templates/
│ └── index.html
├── energy_con_dataset.txt
├── app.py
├── train_model.py
└── requirements.txt
6. How to Run the Project
1. Install dependencies:
bash
CopyEdit
pip install -r requirements.txt
2. Train the model:
17 | P a g e
bash
CopyEdit
python train_model.py
3. Run the Flask web app:
bash
CopyEdit
python app.py
4. Open your browser and go to: http://127.0.0.1:5000/
Output:
Results:
Successfully trained a Linear Regression model using hourly energy consumption data.
Achieved meaningful predictions of energy usage based on past values (Previous Hour).
Developed a real-time interactive web app using Flask for user-friendly forecasting.
Model stored and reused using Joblib, reducing computational load on each request.
18 | P a g e
19 | P a g e
10. Conclusion
This project successfully demonstrates the end-to-end development of a machine learning-powered energy
forecasting system. Starting from raw data ingestion to deploying a web application, each component has
been carefully designed to highlight practical machine learning deployment.
Key Achievements:
Data Processing: Cleaned and transformed a real-world time-series dataset into a machine learning-
friendly format.
Feature Engineering: Introduced a simple lag feature to enable short-term forecasting.
Modeling: Developed a linear regression model that predicts the next hour’s energy consumption
with minimal computational cost.
Evaluation: Showed that even with a single feature, the model can detect and predict general trends
in energy usage.
Deployment: Deployed the model in a Flask web application, offering users an interactive and
intuitive interface.
Limitations:
The current model is limited to a single input feature (Prev_Hour). This restricts its ability to adapt
to complex or sudden changes in consumption.
It does not account for external variables like temperature, time of day, or seasonal trends, which
could enhance prediction accuracy.
There are no built-in visualizations or dashboards to support decision-making.
Future Scope:
Multivariate Models: Incorporate additional features such as weather data, time-based features, and
household activities.
Advanced Algorithms: Explore more sophisticated time-series models like ARIMA, LSTM, or
Prophet.
UI Improvements: Create a dynamic dashboard with charts, feedback, and history of predictions.
Deployment on Cloud: Host the app using platforms like AWS, Heroku, or GCP for broader
accessibility.
20 | P a g e
21 | P a g e