Smart Room Occupancy Analysis: Employing IOT Data and Predictive Insights
Smart Room Occupancy Analysis: Employing IOT Data and Predictive Insights
BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
By
This is to certify that the Real Time Research Project report entitled
“Smart Room Occupancy Analysis: Employing IOT Data and Predicitive
Insights” is the bonafide work carried out and submitted by
ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task
would be incomplete without the mention of people who made it possible, whose constant
guidance and encouragement crowned our efforts with success. It is a pleasant aspect that I
have now the opportunity to express my guidance for all of them.
TABLE OF CONTENT
ABSTRACT 1
LIST OF FIGURES VII
LIST OF GRAPHS VIII
ABBREVIATIONS IX
CH NO DESCRIPTION PAGE NO
CHAPTER 1 INTRODUCTION 1-4
1.1 OVERVIEW
1.2 NEED
1.3 RESEARCH MOTIVATION
1.4 PROBLEM STATEMENT
1.5 OBJECTIVE
1.6 ADVANTAGES
1.7 APPLICATIONS
CHAPTER 2 LITERATURE SURVEY
5-8
CHAPTER 3 EXISTING SYSTEM 9 - 11
3.1 OVERVIEW
3.2 CHALLENGES
3.3 LIMITATIONS
CHAPTER 4 PROPOSED SYSTEM 12 - 16
4.1 OVERVIEW
4.2 PREPROCESSING AND DATA SPLITTING
4.3 ML MODEL BUILDING
4.4 ADVANTAGES
CHAPTER 5 UML DIAGRAMS 17 - 22
CHAPTER 6 SOFTWARE ENVIRONMENT 23 - 26
6.1 SOFTWARE REQUIREMENT
6.2 HARDWARE REQUIREMENTS
CHAPTER 7 FUNCTIONAL REQUIREMENTS 27 - 29
REFERENCES
LIST OF FIGURES
This Project addresses the issue of steam turbine efficiency by discussing the overall
design of high pressure steam turbine blades. A specific focus on blade profile, materials used
in the production of steam turbine blades, and the factors that cause turbine blade failure and
therefore the failure of the turbine itself. This project enumerates and describes the currently
available technologies that enhance the overall efficiency of the generator and prevent turbine
failure due to blade erosion and blade cracking. In particular, this project evaluates the
effectiveness of certain titanium alloys and steels in resisting creep and fracture in turbine
blades. The effectiveness of chemical and thermal coatings in protecting the blade substrate
from corrosion when exposed to wet steam will also be addressed.
The stresses developed in the blade as a result of steam pressure, steam temperature,
and the centrifugal forces due to rotational movement are delineated, current designs
calculated to counter the fatigue caused by these stresses are presented. The aerodynamic
designs of both impulse and reaction turbine blades are compared and contrasted and the
effect that these designs have on turbine efficiency are discussed.
The efficiency of the steam turbine is a key factor in both the economics and
environmental impact of any coal-fired power station. For example, increasing the efficiency
of a typical 600MW turbine by 1% reduces emissions of CO2 from the station by
approximately 50,000 tons per year, with corresponding reductions in SOx and NOx.
Typically, efficiency up rates are economically evaluated at about 700 per kilowatt, so the 1%
increase of a 600MW machine is worth about 4.2 million. Hence steam turbine blade
performance is frequently the single most important criterion for retrofit coal fired power
plant.
Based on the research presented here in this project presents a detailed summary of
what modifications to existing high pressure steam turbine blades can be made to increase
turbine efficiency.
CHAPTER 1
INTRODUCTION
1.1 Overview
The advent of the Internet of Things (IoT) has revolutionized various sectors by enabling
seamless integration of devices, data collection, and real-time monitoring. One of the
significant applications of IoT technology is in the realm of smart buildings, where it is used
to optimize energy consumption, enhance security, and improve overall user comfort. Among
these applications, room occupancy classification has emerged as a critical component,
particularly for energy-efficient building management systems. Accurately determining
whether a room is occupied can lead to significant energy savings by dynamically controlling
lighting, heating, ventilation, and air conditioning (HVAC) systems. Traditional methods of
occupancy detection often rely on intrusive techniques such as cameras or motion detectors,
which raise privacy concerns and may not always provide reliable data.
1.2 Need
The rapid advancement in technology, particularly in the Internet of Things (IoT), has led to
the proliferation of smart environments, where real-time data-driven decisions are essential.
In settings like auditoriums, classrooms, offices, and other large spaces, knowing the
occupancy status is critical for optimizing energy usage, enhancing security, and improving
overall resource management. Traditional methods of monitoring room occupancy, such as
manual inspections or simple motion sensors, are often inadequate due to their limitations in
accuracy and scalability. There is a growing need for more sophisticated systems that can
automatically and accurately classify room occupancy, enabling smarter and more efficient
building management.
The motivation for this research stems from the increasing demand for smart building
solutions that can effectively manage resources and reduce operational costs. With the global
market for smart auditoriums expected to see significant growth, there is a pressing need to
develop systems that can seamlessly integrate with existing infrastructure and provide
reliable occupancy data. Moreover, the inefficiencies and inaccuracies of traditional methods
highlight the need for innovative approaches that leverage the power of IoT and machine
learning. This research aims to address these challenges by developing a robust, data-driven
room occupancy classification system that can transform how spaces are managed in modern
buildings.
Traditional room occupancy detection methods are often labor-intensive, prone to errors, and
lack real-time processing capabilities. These methods, which include periodic inspections or
basic motion sensors, fail to provide the necessary accuracy and scalability required in
today's complex smart environments. The sheer volume of data generated by non-intrusive
IoT sensors in large auditoriums and other similar spaces poses a significant challenge for
effective room occupancy classification. Therefore, there is a need for an advanced system
that can analyze this data in real-time, accurately classify room occupancy, and overcome the
limitations of traditional approaches.
1.5 Objective
The primary objective of this research is to design and implement an IoT-based room
occupancy classification system using non-intrusive sensors and machine learning
algorithms. Specifically, the system aims to:
Compare the performance of traditional algorithms like Naive Bayes with more advanced
techniques such as Gradient Boosting to determine the most effective approach.
Improve the accuracy, efficiency, and scalability of room occupancy detection in smart
auditoriums and similar environments.
Provide a framework that can be integrated into existing building management systems to
optimize energy use, enhance security, and improve overall resource management.
1.6 Advantages
The proposed IoT-based room occupancy classification system offers several key advantages
over traditional methods:
Accuracy: By utilizing machine learning algorithms, the system can analyze complex data
patterns, leading to more accurate occupancy detection.
Scalability: The system can easily be scaled to monitor multiple rooms or larger areas
without significant changes to the underlying infrastructure.
Efficiency: Automated occupancy detection reduces the need for manual inspections, saving
time and labor while minimizing the risk of human error.
Energy Optimization: By accurately determining room occupancy, the system can optimize
the use of heating, ventilation, air conditioning (HVAC), and lighting, leading to significant
energy savings.
1.7 Applications
The IoT-based room occupancy classification system has a wide range of applications across
various domains
Smart Auditoriums: In educational institutions and conference centers, the system can
manage seating arrangements, control lighting, and optimize HVAC systems based on real-
time occupancy data.
Office Buildings: The system can help facility managers allocate resources more efficiently,
reduce energy consumption, and improve workplace comfort.
Residential Buildings: Home automation systems can benefit from accurate room occupancy
data to enhance security and optimize energy use.
Healthcare Facilities: The system can monitor patient rooms, ensuring that they are properly
staffed and that resources are allocated where needed most.
Hospitality Industry: Hotels can use the system to improve guest experiences by
automatically adjusting room conditions based on occupancy status.
CHAPTER 2
LITERATURE SURVEY
1.Conte, G.; Marchi, M.D.; Nacci, A.A.; Rana, V.; Sciuto, D. (2014). "BlueSentinel: A first
approach using iBeacon for an energy efficient occupancy detection system." In Proceedings
of the ACM, Memphis, TN, USA, 3–6 November 2014. is a pioneering work that explores
the use of Apple's then-recently introduced iBeacon technology for smart building
applications, specifically for energy-efficient occupancy detection.
2.Corna, A.; Fontana, L.; Nacci, A.A.; Sciuto, D. (2015). "Occupancy Detection via
iBeacon on Android Devices for Smart Building Management." In Proceedings of the Design,
Automation and Test in Europe Conference and Exhibition, Grenoble, France, 9–13 March
2015. is a direct follow-up and extension of their previous work on BlueSentinel ([61] Conte
et al., 2014), specifically addressing the implementation and performance of iBeacon-based
occupancy detection on Android devices. This is significant because Android represents a
dominant market share in smartphones, making the technology's broader applicability
much more impactful.
3.Filippoupolitis, A.; Oliff, W.; Loukas, G. (2016). "Bluetooth Low Energy Based
Occupancy Detection for Emergency Management." In Proceedings of the 2016 15th
International Conference on Ubiquitous Computing and Communications and 2016
International Symposium on Cyberspace and Security (IUCC-CSS), Granada, Spain, 14–16
December 2016. takes a unique approach to occupancy detection, shifting the primary
application focus from energy efficiency to emergency management. This highlights the
versatility and critical importance of accurate occupancy information in a broader range of
smart building scenarios.
6.Lu, X.; Wen, H.; Han, Z.; Hao, J.; Xie, L.; Trigoni, N. (2016). "Robust occupancy
inference with commodity WiFi." In Proceedings of the IEEE International Conference on
Wireless & Mobile Computing, New York, NY, USA, 17–19 October 2016. focuses on
making WiFi-based occupancy detection highly practical and robust for real-world smart
building applications. It aims to overcome limitations of earlier approaches that might have
required specialized hardware, intensive calibration,
7.Wang, W.; Chen, J.; Hong, T.; Zhu, N. (2018). "Occupancy prediction through Markov
based feedback recurrent neural network (M-FRNN) algorithm with WiFi probe technology."
Building and Environment, 138, 160–170. represents a significant advancement in leveraging
existing WiFi infrastructure for occupancy prediction, particularly by integrating advanced
machine learning techniques to address the inherent challenges of WiFi data.68.Zou, H.;
Jiang, 8.H.; Yang, J.; Xie, L.; Spanos, C. (2017). "Non-intrusive occupancy sensing in
commercial buildings." Energy and Buildings, 154, 633–643. is a significant contribution to
the field of smart building management, specifically focusing on developing and validating
highly practical and privacy-preserving methods for sensing occupancy in commercial
environments. The emphasis on "non-intrusive" is key, differentiating it from more privacy-
invasive techniques while still aiming for high accuracy.
10.Wang, W.; Chen, J.; Hong, T. (2018). "Occupancy prediction through machine learning
and data fusion of environmental sensing and Wi-Fi sensing in buildings." Automation in
Construction, 94, 233–243. is a highly impactful study that emphasizes the power of data
fusion and machine learning for accurate occupancy prediction, specifically by combining
disparate data sources: environmental sensors and WiFi sensing. This approach addresses the
limitations of relying on any single sensing modality.
11.The Marmaroli, Allado, and Boulandet (2023) paper significantly advances the field by
demonstrating a novel, privacy-preserving approach to indoor event detection. It bridges the
gap between the need for detailed smart building information and the public's growing
privacy concerns, drawing upon the robust capabilities of deep learning (CNNs) to extract
actionable intelligence from the subtle acoustic impedance changes of a common household
device. This work contributes to the broader trend of developing multi-functional,
unobtrusive, and intelligent sensing solutions for future smart environments.
13. Padmanabh, K.; Malikarjuna, V.A.; Sen, S.; Katru, S.P.; Paul, S. (2009). "iSense: A
wireless sensor network based conference room management system." In Proceedings of the
ACM, Berkeley, CA, USA, 3 November 2009. is an early and foundational work
demonstrating the practical application of wireless sensor networks (WSNs) for smart
building management, specifically within the context of conference rooms.
15.Huang, Q.; Ge, Z.; Lu, C. (2016). "Occupancy estimation in smart buildings using audio-
processing techniques." arXiv 2016, arXiv:1602.08507. is a significant paper that explores
the potential of using sound to infer human presence and even the number of occupants in
smart building environments. This is a crucial area of research, as it offers an alternative to
more intrusive sensing methods like cameras, while still providing valuable data for
building optimization.
16.Dino, I.G.; Kalfaoglu, E.; Iseri, O.K.; Erdogan, B.; Kalkan, S.; Alatan, A.A. (2022).
"Vision-based estimation of the number of occupants using video cameras." Adv. Eng.
Inform. 2022, 53, 101662. represents a significant advancement in the application of
computer vision and deep learning for precise occupancy estimation in smart buildings. It
directly tackles the need for granular occupant data, especially in complex and
crowded environments.
17.Sun, K.; Liu, P.; Xing, T.; Zhao, Q.; Wang, X. (2022). "A fusion framework for vision-
based indoor occupancy estimation." Building and Environment, 225, 109631. is a significant
contribution to the field of smart building management, specifically focusing on enhancing
the accuracy and robustness of vision-based occupancy estimation through a sophisticated
fusion framework.
18. Fleuret, F.; Berclaz, J.; Lengagne, R.; Fua, P. (2008). "Multicamera people tracking
with a probabilistic occupancy map." IEEE Transactions on Pattern Analysis and Machine
Intelligence, 30, 267. is a seminal work in the field of multi-camera surveillance and tracking,
particularly known for its robust approach to handling occlusions and providing accurate
occupancy information in complex environments.
19.Zou, J.; Zhao, Q.; Yang, W.; Wang, F. (2017). "Occupancy detection in the office by
analyzing surveillance videos and its application to building energy conservation." Energy
and Buildings, 152, 385–398. is a notable work that directly connects the use of existing
surveillance infrastructure to practical energy savings in office environments. It highlights the
dual benefit of security cameras as not just monitoring devices but also as valuable sensors
for smart building applications.
20. Petersen, S.; Pedersen, T.H.; Nielsen, K.U.; Knudsen, M.D. (2016). "Establishing an
image-based ground truth for validation of sensor data-based room occupancy detection."
Energy and Buildings, 130, 787–793. is a crucial methodological contribution in the field of
smart building occupancy detection. It addresses a fundamental challenge: how to reliably
assess the accuracy of various occupancy sensing technologies.
CHAPTER 3
EXISTING SYSTEM
3.1 Overview
The Smart Occupancy Detection System (SODS) is an advanced solution designed for real-
time room occupancy classification using non-intrusive IoT sensors. This system integrates
various sensor technologies, such as passive infrared (PIR) sensors, ultrasonic sensors, and
acoustic sensors, to monitor and analyze occupancy patterns within smart auditoriums. SODS
leverages machine learning (ML) algorithms to process sensor data and provide accurate
occupancy classifications.
Key Features:
Scalability: Designed to handle large volumes of data from multiple sensors, making it
suitable for complex and high-traffic environments.
Automated Reporting: Provides automated insights and alerts on room occupancy, reducing
the need for manual monitoring.
Technologies Used:
ML Algorithms: Decision Trees, Random Forest, and Neural Networks for occupancy
classification.
3.2 Challenges in the Traditional Diagnostic Process
1. Manual Inspection:
Labor-Intensive: Periodic manual inspections require significant human effort and are not
feasible for large or frequently used auditoriums.
Inconsistent Data: Manual logging can lead to inconsistent and incomplete data collection,
impacting the accuracy of occupancy analysis.
Delayed Feedback: Traditional methods do not provide real-time data, leading to delays in
understanding current occupancy and making timely adjustments.
3. Human Error:
Accuracy Issues: Human errors during data collection and logging can result in inaccuracies,
affecting the reliability of occupancy assessments.
4. Scalability Challenges:
High Resource Consumption: Traditional methods require significant resources for manual
data collection, including labor and time.
Data Gaps: Manual approaches are prone to missing data, leading to incomplete occupancy
records and potential inaccuracies.
Limited Precision: Manual methods often rely on periodic checks, which may not capture
transient or fluctuating occupancy levels accurately.
Subjective Interpretation: The reliance on human judgment can introduce biases and errors
in occupancy assessments.
3. Lack of Real-Time Monitoring:
Delayed Reactions: The lack of real-time data limits the ability to optimize resource
allocation and improve operational efficiency.
4. Inability to Scale:
Limited Automation: Traditional approaches lack automation, resulting in slower and less
efficient processes.
CHAPTER 4
PROPOSED SYSTEM
4.1 Overview
The first step in the research process involves gathering the necessary data for room
occupancy classification. The dataset consists of various parameters collected from non-
intrusive IoT sensors installed in a smart auditorium environment. These sensors measure
environmental factors such as temperature, humidity, light levels, and carbon dioxide
concentration, which are indicative of room occupancy. The dataset is collected over time and
stored in a CSV file, ready for further analysis.
Data preprocessing is a critical step that involves preparing the raw data for analysis. This
process includes several key tasks:
Null Value Removal: Missing data can lead to inaccurate analysis, so any null values in the
dataset are identified and removed. This ensures that the dataset is complete and ready for
analysis.
Label Encoding: Since the target variable, "Occupancy," is categorical (e.g., 'occupied' or
'unoccupied'), it needs to be converted into numerical values using label encoding. This
allows machine learning algorithms to process the data more effectively.
To address the issue of class imbalance, the Synthetic Minority Over-sampling Technique
(SMOTE) is employed. SMOTE generates synthetic examples of the minority class to
balance the dataset. This step is crucial for improving the performance of machine learning
models, particularly in cases where the dataset is heavily skewed towards one class.
The existing system for room occupancy classification utilizes the Naive Bayes algorithm, a
simple yet effective probabilistic classifier based on Bayes' theorem. This algorithm is trained
on the preprocessed and resampled dataset to classify room occupancy as either 'occupied' or
'unoccupied.'
The performance of both the Naive Bayes and Gradient Boosting algorithms is evaluated
using key metrics such as accuracy, precision, recall, and F1-score. Confusion matrices are
generated to visualize the classification results, and a detailed comparison is made to
determine which algorithm provides better results in terms of classification accuracy and
reliability.
Finally, the trained Gradient Boosting model is used to predict the room occupancy status on
a separate test dataset. The predicted results are analyzed, and the model's performance is
assessed to ensure that it generalizes well to new, unseen data.
Figure 4.1: Architecture Diagram of Proposed system
Preprocessing is the foundation of any data-driven research, ensuring that the raw data is
transformed into a format suitable for analysis. In this project, preprocessing involves several
key steps:
Handling Missing Data: The dataset is checked for missing values, which are removed to
avoid biases in the analysis. Null values can significantly impact the accuracy of machine
learning models, so their removal ensures that the dataset is complete and reliable.
Label Encoding: The categorical variable 'Occupancy' is converted into numerical form
using label encoding. This step is crucial as most machine learning algorithms require
numerical input to process and learn from the data.
Resampling with SMOTE: Given the potential class imbalance in the dataset, SMOTE is
applied to generate synthetic examples of the minority class. This resampling technique helps
in balancing the dataset, which is essential for improving the classifier's performance.
Data Splitting: The dataset is then split into training and testing sets. Typically, 80% of the
data is used for training the models, while the remaining 20% is reserved for testing. This
split allows for an unbiased evaluation of the model's performance on unseen data.
4.4 Advantages
1. Non-Intrusive Monitoring:
o Utilizes sensors that do not require physical interaction, ensuring privacy for
occupants.
2. Real-Time Data Analysis:
o Provides immediate insights into room occupancy, allowing for dynamic
adjustments in resource allocation and energy management.
3. Energy Efficiency:
o Optimizes heating, ventilation, and air conditioning (HVAC) systems based on
actual occupancy, leading to reduced energy consumption.
4. Improved Space Utilization:
o Analyzes occupancy patterns to identify underutilized spaces, enabling better
management of facilities.
5. Enhanced Comfort:
o Maintains optimal environmental conditions by adjusting settings based on
real-time occupancy data.
6. Scalability:
o The system can be easily scaled to accommodate additional sensors or
integrated with other smart building technologies.
7. Data-Driven Insights:
o Provides valuable data for decision-making regarding space planning and
operational efficiency.
CHAPTER 5
UML DIAGRAM
UML stands for Unified Modeling Language. UML is a standardized general-purpose
modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group. The goal is for UML to
become a common language for creating models of object-oriented computer software. In its
current form UML is comprised of two major components: a Meta-model and a notation. In
the future, some form of method or process also be added to; or associated with, UML.
Goals: The Primary goals in the design of the UML are as follows:
Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
Provide extendibility and specialization mechanisms to extend the core concepts.
Be independent of particular programming languages and development process.
Provide a formal basis for understanding the modeling language.
Encourage the growth of OO tools market.
Support higher level development concepts such as collaborations, frameworks,
patterns and components.
Integrate best practices.
Class Diagram
The class diagram is used to refine the use case diagram and define a detailed design of the
system. The class diagram classifies the actors defined in the use case diagram into a set of
interrelated classes. The relationship or association between the classes can be either an “is-a”
or “has-a” relationship. Each class in the class diagram may be capable of providing certain
functionalities. These functionalities provided by the class are termed “methods” of the class.
Apart from this, each class may have certain “attributes” that uniquely identify the class.
and data stores within an organization or a system. DFDs are commonly used in system
analysis and design to understand, document, and communicate data flow and processing.
Sequence Diagram
Activity diagram is another important diagram in UML to describe the dynamic aspects of the
system.
Deployment diagram: The deployment diagram visualizes the physical hardware on which
the software will be deployed.
Use case diagram: The purpose of use case diagram is to capture the dynamic aspect of a
system.
Component diagram: Component diagram describes the organization and wiring of the
physical components in a system.
CHAPTER 6
SOFTWARE ENVIRONMENT
Python is a high-level, interpreted programming language known for its simplicity and
readability, which makes it a popular choice for beginners as well as experienced developers.
Key features of Python include its dynamic typing, automatic memory management, and a
rich standard library that supports a wide range of applications from web development to data
science and machine learning. Its object-oriented approach and support for multiple
programming paradigms allow developers to write clear, maintainable code. Python's
extensive ecosystem of third-party packages further enhances its capabilities, enabling rapid
development and prototyping across diverse fields.
Installation
First, download the appropriate installer from the official Python website
(https://www.python.org/downloads/release/python-376/). For Windows users, run the
executable installer and ensure to check the "Add Python to PATH" option during installation;
for macOS and Linux, follow the respective package installation commands or use a package
manager like Homebrew or apt-get. After installation, verify the setup by running python --
version or python3 --version in your terminal or command prompt, which should display
"Python 3.7.6." This version-specific installation supports all major functionalities and
libraries compatible with Python 3.7.6, making it an excellent foundation for developing
robust applications in areas such as data analysis, machine learning, and GUI development.
The project requires a robust set of software libraries and tools that work together to build an
integrated system for plant disease classification. Below is an explanation of the key software
requirements and the packages used:
Python: The project is implemented in Python, which is chosen for its extensive
ecosystem of libraries and its strong support for data analysis, machine learning, and
GUI development.
Tkinter: Used to build the graphical user interface (GUI) of the application. It
handles tasks such as user authentication, data upload, and displaying results, making
the system accessible to both admins and end-users.
PIL (Pillow): Utilized for image processing, particularly for handling background
images and other graphical elements within the GUI, thereby enhancing the visual
appeal of the application.
Matplotlib & Seaborn: These libraries are employed for data visualization.
Matplotlib is used for creating standard plots, while Seaborn adds an extra layer of
sophistication for statistical visualizations such as bar plots, violin plots, histograms,
scatter plots, strip plots, and correlation heat maps.
Pandas & NumPy: Essential for data manipulation and analysis. Pandas is used to
load, preprocess, and analyze the CSV dataset, while NumPy supports numerical
operations and data handling, which are crucial for processing large volumes of IoT
data.
Scikit-learn (sklearn): Provides the machine learning framework used in the project.
It includes tools for model training, evaluation, train-test splitting, and data
preprocessing (like label encoding). Models such as Gaussian Naive Bayes, SVM,
KNN, and Decision Tree Classifier are implemented using scikit-learn.
Joblib: Utilized for saving and loading trained machine learning models. This ensures
that once a model is trained, it can be stored and reused without retraining, thereby
improving efficiency.
Each of these packages plays a crucial role in ensuring that the system is robust, scalable, and
efficient—from data ingestion and preprocessing to model training, visualization, and
deployment. The combination of these tools enables the creation of an integrated, user-
friendly application for real-time plant disease classification and management.
Python 3.7.6 can run efficiently on most modern systems with minimal hardware
requirements. However, meeting the recommended specifications ensures better performance,
especially for developers handling large-scale applications or computationally intensive tasks.
By ensuring compatibility with hardware and operating system, can leverage the full potential
of Python 3.7.6.
Memory (RAM) Requirements: Python 3.7.6 does not demand excessive memory but
requires adequate RAM for smooth performance, particularly for running resource-intensive
applications such as data processing, machine learning, or web development.
Insufficient RAM can cause delays or crashes when handling large datasets or executing
computationally heavy programs.
Storage Requirements: Python 3.7.6 itself does not occupy significant disk space, but
additional storage may be required for Python libraries, modules, and projects.
The hardware specifications for the OS directly impact Python’s performance, particularly for
modern software development.
CHAPTER 7
FUNCTIONAL REQUIREMENTS
The "Smart Room Occupancy Analysis" project leverages IoT sensor data to predict room
occupancy status (i.e., whether a room is occupied or unoccupied). The system employs
various machine learning algorithms, including Gradient Boosting and Naive Bayes, to
classify room occupancy based on sensor readings. It performs data analysis, preprocessing,
model training, evaluation, and prediction, making it a robust solution for smart building
applications, energy management, and automation systems.
Key Functionalities:
o The system begins by loading the dataset (datatest.txt) containing IoT sensor
data.
o It provides an initial inspection of the data using methods like head(), tail(),
describe(), and info() to understand the structure and statistics of the data.
o The unique values in the target variable Occupancy are identified to see the
possible classifications (i.e., "occupied" and "unoccupied").
2. Data Preprocessing:
o Missing Value Handling: The dataset is checked for missing values, and
necessary steps are taken to clean the data.
o Feature Engineering: A new day, month, and year columns are created from
the date column, which helps in extracting temporal features to aid the
classification.
o Count Plot: A count plot visualizes the distribution of the target variable
(Occupancy) to show the balance or imbalance between the classes (occupied
vs. unoccupied).
5. Data Splitting:
o The dataset is split into training and testing sets using an 80-20 ratio, allowing
the model to be trained on a portion of the data and evaluated on unseen data.
o Naive Bayes Classifier: The Naive Bayes model is also trained and
evaluated using the same performance metrics. The model is saved to a .pkl
file as well.
o Both classifiers are tested on the test data, and their performance is displayed
through metrics and confusion matrices.
7. Model Evaluation Metrics:
o Precision, recall, F1-score, and accuracy are calculated for each model
(Gradient Boosting and Naive Bayes). These metrics help assess the
effectiveness of the models in predicting room occupancy status.
o The system also displays confusion matrices to help visualize the true positive,
false positive, true negative, and false negative predictions for each model.
o After training and evaluating the models, the system can make predictions on
new, unseen data (test.csv).
o The system outputs the predicted occupancy status for each instance in the
new data.
CHAPTER 8
SOURCE CODE
# IoT Data based Room Occupancy Classification Using Non Intrusive Sensors: ML Design
and Data Analysis
## Importing libraries
import numpy as np
import pandas as pd
import joblib
import os
## Data Analysis
data
data.head()
data.tail
data.describe()
data.info()
data['Occupancy'].unique()
data.columns
## Data Preprocessing
data.isnull().sum()
data.shape
data.corr()
## HeatMap
plt.figure(figsize=(15,10))
plt.xticks(rotation = 80)
plt.yticks(rotation = 45)
plt.show()
labels = set(data['Occupancy'])
labels
labels = ['occupied','unoccupied']
labels
## CountPlot
sns.set(style = 'darkgrid')
plt.figure(figsize = (12,6))
plt.title('Count plot')
plt.xlabel('categories')
plt.ylabel('count')
plt.show()
df = pd.DataFrame(data)
df['date'] = pd.to_datetime(df['date'])
df['day'] = df['date'].dt.day
df['month'] = df['date'].dt.month
df['year'] = df['date'].dt.year
df.drop(columns=['date'], inplace=True)
print(df.head())
df
x = df.drop(['Occupancy'],axis = 1)
x
y = df['Occupancy']
df = SMOTE(random_state=42)
df
sns.set(style = 'darkgrid')
plt.figure(figsize = (12,6))
plt.title('Count plot')
plt.xlabel('categories')
plt.ylabel('count')
plt.show()
## Data Splitting
x_train
y_train
x_test
y_test
x_train.shape
y_train.shape
## Performance Evaluation
precision = []
recall = []
fscore = []
accuracy = []
testY = testY.astype('int')
predict = predict.astype('int')
a = accuracy_score(testY,predict)*100
accuracy.append(a)
precision.append(p)
recall.append(r)
fscore.append(f)
report=classification_report(predict, testY,target_names=labels)
plt.ylabel('True class')
plt.xlabel('Predicted class')
plt.show()
## GradientBoost Algorithm
if os.path.exists('GradientBoost_weights.pkl'):
classifier = joblib.load('GradientBoost_weights.pkl')
predict = classifier.predict(x_test)
else:
classifier.fit(x_train, y_train)
predict = classifier.predict(x_test)
joblib.dump(classifier, 'GradientBoost_weights.pkl')
## NaiveBayes Algorithm
if os.path.exists('NaiveBayes_weights.pkl'):
else:
classifier = GaussianNB()
classifier.fit(x_train, y_train)
joblib.dump(classifier, 'NaiveBayes_weights.pkl')
predict = classifier.predict(x_test)
test = pd.read_csv(r"test.csv")
test
Date_convert = pd.DataFrame(test)
Date_convert['date'] = pd.to_datetime(Date_convert['date'])
Date_convert['day'] = Date_convert['date'].dt.day
Date_convert['month'] = Date_convert['date'].dt.month
Date_convert['year'] = Date_convert['date'].dt.year
Date_convert.drop(columns=['date'], inplace=True)
print(Date_convert.head())
Date_convert
predict = classifier.predict(Date_convert)
predict
A='occupied'
B='unoccupied'
#test = pd.read_csv(r'test.csv')
predict = classifier.predict(Date_convert)
for i in range(len(predict)):
if predict[i] == 0:
elif predict[i]== 1:
1. Library Imports
The project begins by importing essential libraries for data manipulation, visualization,
machine learning, and performance evaluation. These libraries include numpy, pandas,
matplotlib, seaborn, and scikit-learn. Additionally, imblearn is used for handling class
imbalance through the SMOTE technique, and joblib is used for saving and loading trained
models.
2. Data Analysis
The dataset is loaded from a .txt file containing the sensor readings. The initial steps involve
displaying the first few records using head() and analyzing the overall structure and summary
statistics with describe() and info(). This analysis provides insight into the data distribution
and helps identify any potential issues such as missing values.
3. Data Preprocessing
To ensure data quality and consistency, missing values are checked using isnull().sum(), and
the dataset's shape is verified. The correlation between features is visualized through a
heatmap generated by seaborn, providing a clear picture of how the features relate to each
other.
The data is further preprocessed by handling the 'date' column, where it is split into day,
month, and year components to facilitate better analysis. The remaining columns are
converted to numeric types, ensuring the data is ready for machine learning algorithms.
4. Class Imbalance Handling
Given the imbalance in the occupancy classes (occupied vs. unoccupied), the Synthetic
Minority Over-sampling Technique (SMOTE) is applied to generate a balanced dataset. This
step is crucial for preventing the model from being biased toward the majority class.
5. Data Visualization
The distribution of the target variable, 'Occupancy', is visualized using a count plot to
understand the class imbalance before and after applying SMOTE. This visualization helps in
confirming the effectiveness of the resampling technique.
6. Data Splitting
The preprocessed and balanced data is then split into training and testing sets using an 80-20
split. This split allows the model to be trained on a significant portion of the data while
retaining a test set for evaluating the model's generalization ability.
The Gradient Boosting Classifier, a powerful ensemble learning algorithm, is used as one of
the models. If a pre-trained model is available, it is loaded; otherwise, the classifier is trained
on the training set. The model's predictions on the test set are then evaluated using the
defined performance metrics, and the model is saved for future use.
The Naive Bayes algorithm, a simpler yet effective classification technique, is also
implemented. Similar to the Gradient Boosting Classifier, the model is either loaded from a
file or trained from scratch. Its performance is assessed using the same metrics, providing a
comparison between a simple and a more complex model.
10. Testing on New Data
A separate dataset is loaded and preprocessed to test the model's performance on unseen data.
The model predicts the occupancy status for each record, and the results are displayed,
indicating whether the room is occupied or unoccupied
9.2 Dataset Description
The dataset used in this project for room occupancy classification is collected from non-
intrusive IoT sensors deployed in a smart building environment. The dataset captures
environmental conditions that influence room occupancy status (occupied/unoccupied), such
as temperature, humidity, light, CO2 levels, and humidity ratios. The key details and
attributes of the dataset are as follows:
1. Attributes:
The dataset consists of the following features, which serve as input for the machine learning
models:
Date: (Initially present, later split into day, month, year) - This column contains the
timestamp of when the data was recorded. It was later split into separate day, month, and year
columns for better analysis.
Temperature (in Celsius): Reflects the current temperature of the room at the time of the data
recording.
Light (Lux): Measures the light intensity in the room (in lux units).
CO2 (ppm): The carbon dioxide concentration (in parts per million) in the room.
Occupancy: The target variable, which indicates whether the room is occupied
(represented as 1) or unoccupied (represented as 0).
2. Dataset Structure:
The dataset consists of several thousand rows, each representing a sensor reading taken at a
specific timestamp.
The target variable (Occupancy) has an imbalanced distribution, with more instances of the
room being unoccupied than occupied. This imbalance is addressed using the SMOTE
technique during data preprocessing.
3. Data Collection Environment:
The data was collected from a controlled indoor environment where IoT sensors were
deployed to monitor the room's conditions in real time. The non-intrusive nature of the
sensors ensured that they did not interfere with human activities while capturing crucial
environmental data.
4. Data Preprocessing:
The 'date' column was split into day, month, and year columns to derive temporal features.
Missing values were checked, and the numeric columns were validated by replacing commas
with dots to maintain a consistent data format.
The dataset was balanced using SMOTE to address the class imbalance problem in the
Occupancy column.
5. Dataset Statistics:
Features: 7 (after processing), including temperature, humidity, light, CO2, humidity ratio,
and the target label Occupancy.
Correlated Features: Environmental factors like temperature, humidity, and CO2 are
often correlated. A heatmap was generated to visualize these correlations and guide the
selection of the most relevant features for the model.
9.3 Results Description
The results obtained from applying the Gradient Boosting Classifier and Naive Bayes
Classifier to the room occupancy dataset for IoT-based room occupancy detection are
summarized below. These results illustrate the models' performance in classifying room
occupancy (occupied or unoccupied), based on various performance metrics including
accuracy, precision, recall, and F1-score. Additionally, visualizations such as confusion
matrices and the class distribution before and after SMOTE (Synthetic Minority Over-
sampling Technique) highlight critical insights into both model performance and dataset
characteristics.
This figure displays the IoT-based room occupancy dataset, showcasing the initial few rows.
It presents the structure and organization of environmental data collected from sensors,
including features such as temperature, humidity, light, CO2 levels, and humidity ratio. The
target variable Occupancy categorizes whether a room is occupied (1) or unoccupied (0).
Figure 9.3B: Dataset Correlation Heatmap
The correlation heatmap visualizes the relationships between various features in the dataset.
Each cell in the heatmap displays the correlation value between two features, with values
ranging from -1 to 1. A positive value indicates a positive correlation, while a negative value
indicates an inverse relationship. This heatmap highlights significant feature relationships,
particularly between temperature, humidity, and CO2 levels, which play a key role in
predicting room occupancy. These strong correlations provide insight into the underlying
environmental factors that influence occupancy and improve the performance of machine
learning models.
Figure 9.3C: Class Distribution of Occupancy (Before and After SMOTE)
This figure illustrates the class imbalance within the Occupancy variable before and after
applying SMOTE.
Pre-SMOTE: The count plot before SMOTE reveals an imbalance in the dataset, with more
instances of unoccupied rooms compared to occupied ones. This imbalance could skew the
model's predictions toward the majority class, leading to poor performance in detecting
occupied rooms.
Post-SMOTE: After applying SMOTE, the class distribution is balanced, ensuring that the
machine learning models have equal representation of both occupancy classes during
training. This step significantly enhances the classifiers' ability to fairly detect both occupied
and unoccupied rooms, thereby improving model accuracy and reliability.
The confusion matrix and classification report for the Gradient Boosting Classifier
summarize the model’s performance on the test dataset. The confusion matrix contains the
following key elements:
False Positives (FP): Instances incorrectly classified as occupied when they were
actually unoccupied.
False Negatives (FN): Instances incorrectly classified as unoccupied when they were
actually occupied.
The Gradient Boosting Classifier shows high performance, with a strong balance of accuracy,
precision, recall, and F1-score. These metrics indicate that the classifier is effective at
distinguishing between occupied and unoccupied rooms, with minimal errors in
classification.
Similar to the Gradient Boosting Classifier, this figure displays the confusion matrix and
classification report for the Naive Bayes Classifier. By comparing the True Positives, True
Negatives, False Positives, and False Negatives, the confusion matrix highlights the model's
prediction strengths and weaknesses.
The Naive Bayes Classifier demonstrates competitive performance in terms of accuracy,
precision, and recall, particularly in recognizing unoccupied rooms. The classification report
breaks down the performance by providing additional metrics, such as the F1-score, for both
occupied and unoccupied room states. While the Naive Bayes Classifier performs well
overall, it is slightly outperformed by the Gradient Boosting Classifier in terms of precision
and recall.
This figure illustrates the predictions made by both classifiers on unseen test data. The
predicted room occupancy statuses are compared with the actual labels, providing an
assessment of the models' ability to generalize to new data. By analyzing the differences
between predicted and actual values, it is evident that both classifiers provide strong results,
indicating their effectiveness in real-world scenarios for predicting room occupancy based on
IoT sensor data.
Both models demonstrate high accuracy and reliability in classifying room occupancy, with
Gradient Boosting providing marginally better performance compared to Naive Bayes. The
models’ predictions are consistent with the dataset characteristics, suggesting their
applicability for automated room occupancy detection systems in smart environments.
CHAPTER 10
10.1 Conclusion
The data preprocessing phase, which involved data cleaning, encoding, and balancing
through SMOTE, ensured that the input data was of high quality, enhancing the model's
performance. The comparative analysis between the existing Naive Bayes algorithm and the
proposed Gradient Boosting classifier highlighted the superiority of the latter in terms of
accuracy, precision, recall, and F1-score. The Gradient Boosting classifier, with its ability to
model complex relationships and reduce overfitting, provided more reliable and consistent
predictions.
The successful deployment of this system demonstrates the potential of integrating IoT with
machine learning to create smart, responsive environments that can optimize resource usage,
improve energy efficiency, and enhance user experiences in auditoriums and similar settings.
While the current implementation provides a robust foundation for room occupancy
classification, several avenues can be explored to further enhance and extend the system:
Scalability and Real-Time Processing: As the system scales to cover larger areas or
multiple rooms, the need for real-time data processing becomes crucial. Implementing
distributed computing techniques and edge computing can reduce latency and improve the
system's responsiveness.
Incorporation of Advanced Sensors: The integration of additional sensors, such as
infrared, ultrasonic, or CO2 sensors, could improve the accuracy of occupancy detection.
These sensors can capture different aspects of human presence, further refining the
classification results.
Security and Privacy Concerns: As IoT devices become more pervasive, addressing
security and privacy concerns is paramount. Implementing secure communication protocols
and anonymizing data could protect user privacy while maintaining the system's functionality.