Thanks to visit codestin.com
Credit goes to www.scribd.com

0% found this document useful (0 votes)
14 views63 pages

Smart Room Occupancy Analysis: Employing IOT Data and Predictive Insights

The document presents a research project on Smart Room Occupancy Analysis using IoT data and machine learning techniques, aimed at enhancing energy efficiency in smart buildings. It outlines the need for accurate and non-intrusive occupancy detection methods to optimize resource management and reduce operational costs. The project includes a comprehensive literature survey, proposed system design, and evaluation of various machine learning algorithms for real-time occupancy classification.

Uploaded by

vadavacochan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
0% found this document useful (0 votes)
14 views63 pages

Smart Room Occupancy Analysis: Employing IOT Data and Predictive Insights

The document presents a research project on Smart Room Occupancy Analysis using IoT data and machine learning techniques, aimed at enhancing energy efficiency in smart buildings. It outlines the need for accurate and non-intrusive occupancy detection methods to optimize resource management and reduce operational costs. The project includes a comprehensive literature survey, proposed system design, and evaluation of various machine learning algorithms for real-time occupancy classification.

Uploaded by

vadavacochan
Copyright
© © All Rights Reserved
We take content rights seriously. If you suspect this is your content, claim it here.
Available Formats
Download as DOCX, PDF, TXT or read online on Scribd
You are on page 1/ 63

A Real Time Research Project on

Smart Room Occupancy Analysis: Employing IOT


Data and Predictive Insights
Submitted in partial fulfilment of the requirements
for the award of the degree of

BACHELOR OF TECHNOLOGY
IN
COMPUTER SCIENCE AND ENGINEERING
(ARTIFICIAL INTELLIGENCE & MACHINE LEARNING)
By

Under the esteemed guidance of


Affiliated to

JAWAHARLAL NEHRU TECHNOLOGICAL UNIVERSITY


HYDERABAD KUKATPALLY, HYDERABAD-85.

This is to certify that the Real Time Research Project report entitled
“Smart Room Occupancy Analysis: Employing IOT Data and Predicitive
Insights” is the bonafide work carried out and submitted by

ACKNOWLEDGEMENT
The satisfaction and euphoria that accompany the successful completion of any task
would be incomplete without the mention of people who made it possible, whose constant
guidance and encouragement crowned our efforts with success. It is a pleasant aspect that I
have now the opportunity to express my guidance for all of them.
TABLE OF CONTENT

ABSTRACT 1
LIST OF FIGURES VII
LIST OF GRAPHS VIII
ABBREVIATIONS IX
CH NO DESCRIPTION PAGE NO
CHAPTER 1 INTRODUCTION 1-4
1.1 OVERVIEW
1.2 NEED
1.3 RESEARCH MOTIVATION
1.4 PROBLEM STATEMENT
1.5 OBJECTIVE
1.6 ADVANTAGES
1.7 APPLICATIONS
CHAPTER 2 LITERATURE SURVEY
5-8
CHAPTER 3 EXISTING SYSTEM 9 - 11
3.1 OVERVIEW
3.2 CHALLENGES
3.3 LIMITATIONS
CHAPTER 4 PROPOSED SYSTEM 12 - 16
4.1 OVERVIEW
4.2 PREPROCESSING AND DATA SPLITTING
4.3 ML MODEL BUILDING
4.4 ADVANTAGES
CHAPTER 5 UML DIAGRAMS 17 - 22
CHAPTER 6 SOFTWARE ENVIRONMENT 23 - 26
6.1 SOFTWARE REQUIREMENT
6.2 HARDWARE REQUIREMENTS
CHAPTER 7 FUNCTIONAL REQUIREMENTS 27 - 29

CHAPTER 8 SOURCE CODE 30 - 37

CHAPTER 9 RESULTS AND DISCUSSION 38 - 47


9.1 IMPLEMENTATION DESCRIPTION
9.2 DATASET DESCRIPTION
9.3 RESULTS DESCRIPTION
CHAPTER 10 CONCLUSION AND FUTURE SCOPE 48 - 50
10.1 CONCLUSION
10.2 FUTURE SCOPE

REFERENCES
LIST OF FIGURES

Fig 4.1 Architecture Design of Proposed System 14


Fig 5.1 Class Diagram 18
Fig 5.2 Dataflow Diagram 19
Fig 5.3 Sequence Diagram 19
Fig 5.4 Activity Diagram 20
Fig 5.5 Deployment Diagram 21
Fig 5.6 Use Case Diagram 21
Fig 5.7 Component Diagram 22
LIST OF GRAPHS

Fig 9.1 Dataset Overview 42


Fig 9.2 Dataset Correlation Heatmap 43
Fig 9.3 Class Distribution of Occupancy 44
Fig 9.4 Performance Metrix of Gradient Boosting 45
Fig 9.5 Performance Metrix for Naive Bayse 46
Fig 9.6 Prediction on Test Data 47
ABBREVATIONS

Abbreviation Full Form


AI Artificial Intelligence
ML Machine Learning
IoT Internet of Things
SMOTE Synthetic Minority Over-sampling Technique
HVAC Heating, Ventilation, and Air Conditioning
SODS Smart Occupancy Detection System
CSV Comma Separated Values
UML Unified Modeling Language
DFD Data Flow Diagram
PIR Passive Infrared
CO₂ Carbon Dioxide
GUI Graphical User Interface
CPU Central Processing Unit
RAM Random Access Memory
API Application Programming Interface
JSON JavaScript Object Notation
RNN Recurrent Neural Network
DNN Deep Neural Network
TP True Positive
TN True Negative
FP False Positive
FN False Negative
NB Naive Bayes
GB Gradient Boosting
ABSTRACT

This Project addresses the issue of steam turbine efficiency by discussing the overall
design of high pressure steam turbine blades. A specific focus on blade profile, materials used
in the production of steam turbine blades, and the factors that cause turbine blade failure and
therefore the failure of the turbine itself. This project enumerates and describes the currently
available technologies that enhance the overall efficiency of the generator and prevent turbine
failure due to blade erosion and blade cracking. In particular, this project evaluates the
effectiveness of certain titanium alloys and steels in resisting creep and fracture in turbine
blades. The effectiveness of chemical and thermal coatings in protecting the blade substrate
from corrosion when exposed to wet steam will also be addressed.

The stresses developed in the blade as a result of steam pressure, steam temperature,
and the centrifugal forces due to rotational movement are delineated, current designs
calculated to counter the fatigue caused by these stresses are presented. The aerodynamic
designs of both impulse and reaction turbine blades are compared and contrasted and the
effect that these designs have on turbine efficiency are discussed.

The efficiency of the steam turbine is a key factor in both the economics and
environmental impact of any coal-fired power station. For example, increasing the efficiency
of a typical 600MW turbine by 1% reduces emissions of CO2 from the station by
approximately 50,000 tons per year, with corresponding reductions in SOx and NOx.
Typically, efficiency up rates are economically evaluated at about 700 per kilowatt, so the 1%
increase of a 600MW machine is worth about 4.2 million. Hence steam turbine blade
performance is frequently the single most important criterion for retrofit coal fired power
plant.

Based on the research presented here in this project presents a detailed summary of
what modifications to existing high pressure steam turbine blades can be made to increase
turbine efficiency.
CHAPTER 1

INTRODUCTION

1.1 Overview

The advent of the Internet of Things (IoT) has revolutionized various sectors by enabling
seamless integration of devices, data collection, and real-time monitoring. One of the
significant applications of IoT technology is in the realm of smart buildings, where it is used
to optimize energy consumption, enhance security, and improve overall user comfort. Among
these applications, room occupancy classification has emerged as a critical component,
particularly for energy-efficient building management systems. Accurately determining
whether a room is occupied can lead to significant energy savings by dynamically controlling
lighting, heating, ventilation, and air conditioning (HVAC) systems. Traditional methods of
occupancy detection often rely on intrusive techniques such as cameras or motion detectors,
which raise privacy concerns and may not always provide reliable data.

To address these challenges, non-intrusive sensors such as temperature,


humidity, CO2, and sound sensors are increasingly being employed to infer room occupancy
indirectly. These sensors offer a privacy-preserving alternative to direct observation methods,
while still providing valuable data for occupancy classification. The data collected from these
sensors can be leveraged to train machine learning (ML) models, which can learn complex
patterns and make accurate predictions regarding room occupancy status.

The integration of IoT devices with ML algorithms presents a powerful


approach to room occupancy classification. By utilizing data-driven models, it becomes
possible to automate the process of occupancy detection with high accuracy and minimal
human intervention. This not only enhances the efficiency of building management systems
but also contributes to sustainable energy practices by reducing unnecessary power
consumption.

This study focuses on the design and analysis of an IoT-based room


occupancy classification system using non-intrusive sensors. The proposed system employs
various machine learning algorithms to analyze sensor data and classify the occupancy status
of rooms in real-time. Through comprehensive data analysis and model evaluation, the study
aims to demonstrate the effectiveness of the proposed approach in achieving reliable and
accurate room occupancy classification. Additionally, the research explores the impact of
different sensor types, data preprocessing techniques, and ML models on the overall
performance of the system. The insights gained from this study can serve as a foundation for
further advancements in smart building technologies, contributing to the development of
more intelligent and energy-efficient infrastructures.

1.2 Need
The rapid advancement in technology, particularly in the Internet of Things (IoT), has led to
the proliferation of smart environments, where real-time data-driven decisions are essential.
In settings like auditoriums, classrooms, offices, and other large spaces, knowing the
occupancy status is critical for optimizing energy usage, enhancing security, and improving
overall resource management. Traditional methods of monitoring room occupancy, such as
manual inspections or simple motion sensors, are often inadequate due to their limitations in
accuracy and scalability. There is a growing need for more sophisticated systems that can
automatically and accurately classify room occupancy, enabling smarter and more efficient
building management.

1.3 Research Motivation

The motivation for this research stems from the increasing demand for smart building
solutions that can effectively manage resources and reduce operational costs. With the global
market for smart auditoriums expected to see significant growth, there is a pressing need to
develop systems that can seamlessly integrate with existing infrastructure and provide
reliable occupancy data. Moreover, the inefficiencies and inaccuracies of traditional methods
highlight the need for innovative approaches that leverage the power of IoT and machine
learning. This research aims to address these challenges by developing a robust, data-driven
room occupancy classification system that can transform how spaces are managed in modern
buildings.

1.4 Problem Statement

Traditional room occupancy detection methods are often labor-intensive, prone to errors, and
lack real-time processing capabilities. These methods, which include periodic inspections or
basic motion sensors, fail to provide the necessary accuracy and scalability required in
today's complex smart environments. The sheer volume of data generated by non-intrusive
IoT sensors in large auditoriums and other similar spaces poses a significant challenge for
effective room occupancy classification. Therefore, there is a need for an advanced system
that can analyze this data in real-time, accurately classify room occupancy, and overcome the
limitations of traditional approaches.

1.5 Objective

The primary objective of this research is to design and implement an IoT-based room
occupancy classification system using non-intrusive sensors and machine learning
algorithms. Specifically, the system aims to:

Leverage IoT sensor data to accurately classify room occupancy in real-time.

Compare the performance of traditional algorithms like Naive Bayes with more advanced
techniques such as Gradient Boosting to determine the most effective approach.

Improve the accuracy, efficiency, and scalability of room occupancy detection in smart
auditoriums and similar environments.

Provide a framework that can be integrated into existing building management systems to
optimize energy use, enhance security, and improve overall resource management.

1.6 Advantages

The proposed IoT-based room occupancy classification system offers several key advantages
over traditional methods:

Accuracy: By utilizing machine learning algorithms, the system can analyze complex data
patterns, leading to more accurate occupancy detection.

Real-time Processing: The system is capable of processing data in real-time, providing


immediate insights and enabling quick decision-making.

Scalability: The system can easily be scaled to monitor multiple rooms or larger areas
without significant changes to the underlying infrastructure.

Efficiency: Automated occupancy detection reduces the need for manual inspections, saving
time and labor while minimizing the risk of human error.

Energy Optimization: By accurately determining room occupancy, the system can optimize
the use of heating, ventilation, air conditioning (HVAC), and lighting, leading to significant
energy savings.
1.7 Applications

The IoT-based room occupancy classification system has a wide range of applications across
various domains

Smart Auditoriums: In educational institutions and conference centers, the system can
manage seating arrangements, control lighting, and optimize HVAC systems based on real-
time occupancy data.

Office Buildings: The system can help facility managers allocate resources more efficiently,
reduce energy consumption, and improve workplace comfort.

Residential Buildings: Home automation systems can benefit from accurate room occupancy
data to enhance security and optimize energy use.

Healthcare Facilities: The system can monitor patient rooms, ensuring that they are properly
staffed and that resources are allocated where needed most.

Hospitality Industry: Hotels can use the system to improve guest experiences by
automatically adjusting room conditions based on occupancy status.
CHAPTER 2

LITERATURE SURVEY
1.Conte, G.; Marchi, M.D.; Nacci, A.A.; Rana, V.; Sciuto, D. (2014). "BlueSentinel: A first
approach using iBeacon for an energy efficient occupancy detection system." In Proceedings
of the ACM, Memphis, TN, USA, 3–6 November 2014. is a pioneering work that explores
the use of Apple's then-recently introduced iBeacon technology for smart building
applications, specifically for energy-efficient occupancy detection.

2.Corna, A.; Fontana, L.; Nacci, A.A.; Sciuto, D. (2015). "Occupancy Detection via
iBeacon on Android Devices for Smart Building Management." In Proceedings of the Design,
Automation and Test in Europe Conference and Exhibition, Grenoble, France, 9–13 March
2015. is a direct follow-up and extension of their previous work on BlueSentinel ([61] Conte
et al., 2014), specifically addressing the implementation and performance of iBeacon-based
occupancy detection on Android devices. This is significant because Android represents a
dominant market share in smartphones, making the technology's broader applicability
much more impactful.

3.Filippoupolitis, A.; Oliff, W.; Loukas, G. (2016). "Bluetooth Low Energy Based
Occupancy Detection for Emergency Management." In Proceedings of the 2016 15th
International Conference on Ubiquitous Computing and Communications and 2016
International Symposium on Cyberspace and Security (IUCC-CSS), Granada, Spain, 14–16
December 2016. takes a unique approach to occupancy detection, shifting the primary
application focus from energy efficiency to emergency management. This highlights the
versatility and critical importance of accurate occupancy information in a broader range of
smart building scenarios.

4. Mohottige, I.P.; Gharakheili, H.H.; Sivaraman, V.; Moors, T. (2022). "Modeling


Classroom Occupancy using Data of WiFi Infrastructure in a University Campus." IEEE
Sensors Journal, 22, 9981–9996. is a significant contribution to the field of smart building
management and ubiquitous sensing. It specifically focuses on a cost-effective and non-
intrusive method for occupancy estimation in university classrooms, leveraging existing
WiFi infrastructure.
5.Choi, H.; Fujimoto, M.; Matsui, T.; Misaki, S.; Yasumoto, K. (2022). "Wi-CaL: WiFi
sensing and machine learning based device-free crowd counting and localization." IEEE
Access, 10, 24395–24410. represents a state-of-the-art approach in the field of device-free
WiFi sensing for advanced occupancy monitoring, specifically focusing on both crowd
counting and localization without requiring individuals to carry any specific devices. This
marks a significant leap in privacy-preserving and pervasive sensing capabilities.

6.Lu, X.; Wen, H.; Han, Z.; Hao, J.; Xie, L.; Trigoni, N. (2016). "Robust occupancy
inference with commodity WiFi." In Proceedings of the IEEE International Conference on
Wireless & Mobile Computing, New York, NY, USA, 17–19 October 2016. focuses on
making WiFi-based occupancy detection highly practical and robust for real-world smart
building applications. It aims to overcome limitations of earlier approaches that might have
required specialized hardware, intensive calibration,

7.Wang, W.; Chen, J.; Hong, T.; Zhu, N. (2018). "Occupancy prediction through Markov
based feedback recurrent neural network (M-FRNN) algorithm with WiFi probe technology."
Building and Environment, 138, 160–170. represents a significant advancement in leveraging
existing WiFi infrastructure for occupancy prediction, particularly by integrating advanced
machine learning techniques to address the inherent challenges of WiFi data.68.Zou, H.;
Jiang, 8.H.; Yang, J.; Xie, L.; Spanos, C. (2017). "Non-intrusive occupancy sensing in
commercial buildings." Energy and Buildings, 154, 633–643. is a significant contribution to
the field of smart building management, specifically focusing on developing and validating
highly practical and privacy-preserving methods for sensing occupancy in commercial
environments. The emphasis on "non-intrusive" is key, differentiating it from more privacy-
invasive techniques while still aiming for high accuracy.

9. Mohottige, I.P.; Moors, T. (2018). "Estimating Room Occupancy in a Smart Campus


using WiFi Soft Sensors." In Proceedings of the 2018 IEEE 43rd Conference on Local
Computer Networks (LCN), Chicago, IL, USA, 1–4 October 2018. delves into the practical
application of WiFi infrastructure for occupancy estimation, particularly in the challenging
and dynamic environment of a university campus. This work builds upon the growing interest
in leveraging existing wireless networks as "soft sensors" for smart building applications.

10.Wang, W.; Chen, J.; Hong, T. (2018). "Occupancy prediction through machine learning
and data fusion of environmental sensing and Wi-Fi sensing in buildings." Automation in
Construction, 94, 233–243. is a highly impactful study that emphasizes the power of data
fusion and machine learning for accurate occupancy prediction, specifically by combining
disparate data sources: environmental sensors and WiFi sensing. This approach addresses the
limitations of relying on any single sensing modality.

11.The Marmaroli, Allado, and Boulandet (2023) paper significantly advances the field by
demonstrating a novel, privacy-preserving approach to indoor event detection. It bridges the
gap between the need for detailed smart building information and the public's growing
privacy concerns, drawing upon the robust capabilities of deep learning (CNNs) to extract
actionable intelligence from the subtle acoustic impedance changes of a common household
device. This work contributes to the broader trend of developing multi-functional,
unobtrusive, and intelligent sensing solutions for future smart environments.

12.Singh, Jain, Chaudhari, Kraemer, and Garg (2018), "Machine Learning-Based


Occupancy Estimation Using Multivariate Sensor Nodes," is a significant contribution to the
field of smart building management and energy efficiency. It addresses the critical need for
accurate occupancy information to optimize building systems, particularly HVAC (Heating,
Ventilation, and Air Conditioning) and lighting.

13. Padmanabh, K.; Malikarjuna, V.A.; Sen, S.; Katru, S.P.; Paul, S. (2009). "iSense: A
wireless sensor network based conference room management system." In Proceedings of the
ACM, Berkeley, CA, USA, 3 November 2009. is an early and foundational work
demonstrating the practical application of wireless sensor networks (WSNs) for smart
building management, specifically within the context of conference rooms.

14.Zheng, Y.; Becerik-Gerber, B. (2015), "Cross-Space Building Occupancy Modeling by


Contextual Information Based Learning," is a significant contribution to the field of smart
building management, particularly in addressing the challenge of generalizing occupancy
models across different spaces.

15.Huang, Q.; Ge, Z.; Lu, C. (2016). "Occupancy estimation in smart buildings using audio-
processing techniques." arXiv 2016, arXiv:1602.08507. is a significant paper that explores
the potential of using sound to infer human presence and even the number of occupants in
smart building environments. This is a crucial area of research, as it offers an alternative to
more intrusive sensing methods like cameras, while still providing valuable data for
building optimization.
16.Dino, I.G.; Kalfaoglu, E.; Iseri, O.K.; Erdogan, B.; Kalkan, S.; Alatan, A.A. (2022).
"Vision-based estimation of the number of occupants using video cameras." Adv. Eng.
Inform. 2022, 53, 101662. represents a significant advancement in the application of
computer vision and deep learning for precise occupancy estimation in smart buildings. It
directly tackles the need for granular occupant data, especially in complex and
crowded environments.

17.Sun, K.; Liu, P.; Xing, T.; Zhao, Q.; Wang, X. (2022). "A fusion framework for vision-
based indoor occupancy estimation." Building and Environment, 225, 109631. is a significant
contribution to the field of smart building management, specifically focusing on enhancing
the accuracy and robustness of vision-based occupancy estimation through a sophisticated
fusion framework.

18. Fleuret, F.; Berclaz, J.; Lengagne, R.; Fua, P. (2008). "Multicamera people tracking
with a probabilistic occupancy map." IEEE Transactions on Pattern Analysis and Machine
Intelligence, 30, 267. is a seminal work in the field of multi-camera surveillance and tracking,
particularly known for its robust approach to handling occlusions and providing accurate
occupancy information in complex environments.

19.Zou, J.; Zhao, Q.; Yang, W.; Wang, F. (2017). "Occupancy detection in the office by
analyzing surveillance videos and its application to building energy conservation." Energy
and Buildings, 152, 385–398. is a notable work that directly connects the use of existing
surveillance infrastructure to practical energy savings in office environments. It highlights the
dual benefit of security cameras as not just monitoring devices but also as valuable sensors
for smart building applications.

20. Petersen, S.; Pedersen, T.H.; Nielsen, K.U.; Knudsen, M.D. (2016). "Establishing an
image-based ground truth for validation of sensor data-based room occupancy detection."
Energy and Buildings, 130, 787–793. is a crucial methodological contribution in the field of
smart building occupancy detection. It addresses a fundamental challenge: how to reliably
assess the accuracy of various occupancy sensing technologies.
CHAPTER 3

EXISTING SYSTEM

3.1 Overview

The Smart Occupancy Detection System (SODS) is an advanced solution designed for real-
time room occupancy classification using non-intrusive IoT sensors. This system integrates
various sensor technologies, such as passive infrared (PIR) sensors, ultrasonic sensors, and
acoustic sensors, to monitor and analyze occupancy patterns within smart auditoriums. SODS
leverages machine learning (ML) algorithms to process sensor data and provide accurate
occupancy classifications.

Key Features:

Real-Time Data Collection: Utilizes a network of non-intrusive sensors to continuously


gather data on room occupancy.

Machine Learning Integration: Employs ML algorithms to analyze sensor data, identify


occupancy patterns, and make real-time predictions.

Scalability: Designed to handle large volumes of data from multiple sensors, making it
suitable for complex and high-traffic environments.

Automated Reporting: Provides automated insights and alerts on room occupancy, reducing
the need for manual monitoring.

Technologies Used:

Sensors: Passive Infrared (PIR), Ultrasonic, and Acoustic sensors.

Data Processing: Cloud-based data storage and processing.

ML Algorithms: Decision Trees, Random Forest, and Neural Networks for occupancy
classification.
3.2 Challenges in the Traditional Diagnostic Process

1. Manual Inspection:

Labor-Intensive: Periodic manual inspections require significant human effort and are not
feasible for large or frequently used auditoriums.

Inconsistent Data: Manual logging can lead to inconsistent and incomplete data collection,
impacting the accuracy of occupancy analysis.

2. Limited Real-Time Insights:

Delayed Feedback: Traditional methods do not provide real-time data, leading to delays in
understanding current occupancy and making timely adjustments.

3. Human Error:

Accuracy Issues: Human errors during data collection and logging can result in inaccuracies,
affecting the reliability of occupancy assessments.

4. Scalability Challenges:

Difficulty in Scaling: As the complexity of auditorium systems grows, manual methods


struggle to keep pace with the increased volume of data and the need for detailed analysis.

3.3 Limitations of Traditional Approaches

1. Inefficiency in Data Collection:

High Resource Consumption: Traditional methods require significant resources for manual
data collection, including labor and time.

Data Gaps: Manual approaches are prone to missing data, leading to incomplete occupancy
records and potential inaccuracies.

2. Inaccuracies in Occupancy Detection:

Limited Precision: Manual methods often rely on periodic checks, which may not capture
transient or fluctuating occupancy levels accurately.

Subjective Interpretation: The reliance on human judgment can introduce biases and errors
in occupancy assessments.
3. Lack of Real-Time Monitoring:

No Immediate Feedback: Traditional approaches do not offer real-time insights, making it


challenging to respond quickly to changes in room occupancy.

Delayed Reactions: The lack of real-time data limits the ability to optimize resource
allocation and improve operational efficiency.

4. Inability to Scale:

Complexity Management: As auditorium systems become more complex, manual methods


become less effective at managing and analyzing large volumes of data.

Limited Automation: Traditional approaches lack automation, resulting in slower and less
efficient processes.
CHAPTER 4

PROPOSED SYSTEM

4.1 Overview

Step 1: Dataset Collection

The first step in the research process involves gathering the necessary data for room
occupancy classification. The dataset consists of various parameters collected from non-
intrusive IoT sensors installed in a smart auditorium environment. These sensors measure
environmental factors such as temperature, humidity, light levels, and carbon dioxide
concentration, which are indicative of room occupancy. The dataset is collected over time and
stored in a CSV file, ready for further analysis.

Step 2: Dataset Preprocessing

Data preprocessing is a critical step that involves preparing the raw data for analysis. This
process includes several key tasks:

Null Value Removal: Missing data can lead to inaccurate analysis, so any null values in the
dataset are identified and removed. This ensures that the dataset is complete and ready for
analysis.

Label Encoding: Since the target variable, "Occupancy," is categorical (e.g., 'occupied' or
'unoccupied'), it needs to be converted into numerical values using label encoding. This
allows machine learning algorithms to process the data more effectively.

Step 3: Data Resampling using SMOTE

To address the issue of class imbalance, the Synthetic Minority Over-sampling Technique
(SMOTE) is employed. SMOTE generates synthetic examples of the minority class to
balance the dataset. This step is crucial for improving the performance of machine learning
models, particularly in cases where the dataset is heavily skewed towards one class.

Step 4: Existing Algorithm (Naive Bayes)

The existing system for room occupancy classification utilizes the Naive Bayes algorithm, a
simple yet effective probabilistic classifier based on Bayes' theorem. This algorithm is trained
on the preprocessed and resampled dataset to classify room occupancy as either 'occupied' or
'unoccupied.'

Step 5: Proposed Algorithm (Gradient Boosting)

To enhance the classification performance, a Gradient Boosting algorithm is proposed as an


alternative to Naive Bayes. Gradient Boosting is a powerful ensemble learning technique that
builds multiple weak learners (usually decision trees) and combines them to form a strong
classifier. The algorithm is trained on the same dataset to compare its performance against the
existing Naive Bayes model.

Step 6: Performance Comparison

The performance of both the Naive Bayes and Gradient Boosting algorithms is evaluated
using key metrics such as accuracy, precision, recall, and F1-score. Confusion matrices are
generated to visualize the classification results, and a detailed comparison is made to
determine which algorithm provides better results in terms of classification accuracy and
reliability.

Step 7: Prediction of Output from Test Data

Finally, the trained Gradient Boosting model is used to predict the room occupancy status on
a separate test dataset. The predicted results are analyzed, and the model's performance is
assessed to ensure that it generalizes well to new, unseen data.
Figure 4.1: Architecture Diagram of Proposed system

4.2 Preprocessing and Data Splitting

Preprocessing is the foundation of any data-driven research, ensuring that the raw data is
transformed into a format suitable for analysis. In this project, preprocessing involves several
key steps:

Handling Missing Data: The dataset is checked for missing values, which are removed to
avoid biases in the analysis. Null values can significantly impact the accuracy of machine
learning models, so their removal ensures that the dataset is complete and reliable.

Label Encoding: The categorical variable 'Occupancy' is converted into numerical form
using label encoding. This step is crucial as most machine learning algorithms require
numerical input to process and learn from the data.

Resampling with SMOTE: Given the potential class imbalance in the dataset, SMOTE is
applied to generate synthetic examples of the minority class. This resampling technique helps
in balancing the dataset, which is essential for improving the classifier's performance.

Data Splitting: The dataset is then split into training and testing sets. Typically, 80% of the
data is used for training the models, while the remaining 20% is reserved for testing. This
split allows for an unbiased evaluation of the model's performance on unseen data.

4.3 ML Model building:

4.3.1 Naive Bayes Classifier


Naive Bayes is a family of probabilistic classifiers based on applying Bayes' theorem with
strong (naive) independence assumptions between the features. It is particularly effective for
large datasets and performs well in text classification tasks, such as spam detection and
sentiment analysis. The model calculates the probability of each class given the features,
assuming that the presence of a feature in a class is independent of the presence of any other
feature.
Key Characteristics:
 Simplicity: Easy to implement and interpret.
 Efficiency: Requires a small amount of training data to estimate the parameters.
 Scalability: Performs well with high-dimensional data.
Limitations:
 The strong independence assumption rarely holds in real-world data, which can lead
to suboptimal performance.
 Poor performance when features are correlated.
4.3.2 Gradient Boosting Classifier
Gradient Boosting is an ensemble learning technique that builds models in a stage-wise
fashion. It combines multiple weak learners (usually decision trees) to create a robust
predictive model. Each new model attempts to correct the errors made by the previous
models, minimizing the loss function.
Key Characteristics:
 Flexibility: Can be optimized for different loss functions and provides a variety of
hyperparameters for tuning.
 Performance: Often outperforms simpler models, especially on complex datasets.
 Robustness: Handles various types of data and can manage missing values
effectively.
4.3.3 Performance Comparison
1. Complex Feature Interactions: When the relationships between features are not
independent, Gradient Boosting can capture these interactions effectively, leading to
improved accuracy.
2. High-Dimensional Data: Gradient Boosting can handle high-dimensional spaces
better by focusing on the most informative features through its iterative process.
3. Structured Data: In structured datasets, such as those found in finance or healthcare,
Gradient Boosting's ability to model complex patterns can significantly enhance
performance.
4. Flexibility in Loss Function: Gradient Boosting allows for the optimization of
custom loss functions, making it versatile for different types of problems, while Naive
Bayes is limited to a fixed probabilistic framework.

4.4 Advantages
1. Non-Intrusive Monitoring:
o Utilizes sensors that do not require physical interaction, ensuring privacy for
occupants.
2. Real-Time Data Analysis:
o Provides immediate insights into room occupancy, allowing for dynamic
adjustments in resource allocation and energy management.
3. Energy Efficiency:
o Optimizes heating, ventilation, and air conditioning (HVAC) systems based on
actual occupancy, leading to reduced energy consumption.
4. Improved Space Utilization:
o Analyzes occupancy patterns to identify underutilized spaces, enabling better
management of facilities.
5. Enhanced Comfort:
o Maintains optimal environmental conditions by adjusting settings based on
real-time occupancy data.
6. Scalability:
o The system can be easily scaled to accommodate additional sensors or
integrated with other smart building technologies.
7. Data-Driven Insights:
o Provides valuable data for decision-making regarding space planning and
operational efficiency.
CHAPTER 5

UML DIAGRAM
UML stands for Unified Modeling Language. UML is a standardized general-purpose
modeling language in the field of object-oriented software engineering. The standard is
managed, and was created by, the Object Management Group. The goal is for UML to
become a common language for creating models of object-oriented computer software. In its
current form UML is comprised of two major components: a Meta-model and a notation. In
the future, some form of method or process also be added to; or associated with, UML.

The Unified Modeling Language Is a standard language for specifying, Visualization,


Constructing and documenting the artifacts of software system, as well as for business
modeling and other non-software systems. The UML represents a collection of best
engineering practices that have proven successful in the modeling of large and complex
systems. The UML is a very important part of developing objects-oriented software and the
software development process. The UML uses mostly graphical notations to express the
design of software projects.

Goals: The Primary goals in the design of the UML are as follows:

 Provide users a ready-to-use, expressive visual modeling Language so that they can
develop and exchange meaningful models.
 Provide extendibility and specialization mechanisms to extend the core concepts.
 Be independent of particular programming languages and development process.
 Provide a formal basis for understanding the modeling language.
 Encourage the growth of OO tools market.
 Support higher level development concepts such as collaborations, frameworks,
patterns and components.
 Integrate best practices.

Class Diagram
The class diagram is used to refine the use case diagram and define a detailed design of the
system. The class diagram classifies the actors defined in the use case diagram into a set of
interrelated classes. The relationship or association between the classes can be either an “is-a”

or “has-a” relationship. Each class in the class diagram may be capable of providing certain
functionalities. These functionalities provided by the class are termed “methods” of the class.
Apart from this, each class may have certain “attributes” that uniquely identify the class.

Data flow diagram


A Data Flow Diagram (DFD) is a visual representation of the flow of data within a system or
process. It is a structured technique that focuses on how data moves through different
processes

and data stores within an organization or a system. DFDs are commonly used in system
analysis and design to understand, document, and communicate data flow and processing.

Sequence Diagram

A sequence diagram in Unified Modeling Language (UML) is a kind of interaction diagram


that shows how processes operate with one another and in what order. It is a construct of a
Message Sequence Chart. A sequence diagram shows, as parallel vertical lines (“lifelines”),
different processes or objects that live simultaneously, and as horizontal arrows, the messages
exchanged between them, in the order in which they occur. This allows the specification of
simple runtime scenarios in a graphical manner.
Activity diagram

Activity diagram is another important diagram in UML to describe the dynamic aspects of the
system.
Deployment diagram: The deployment diagram visualizes the physical hardware on which
the software will be deployed.
Use case diagram: The purpose of use case diagram is to capture the dynamic aspect of a
system.
Component diagram: Component diagram describes the organization and wiring of the
physical components in a system.
CHAPTER 6

SOFTWARE ENVIRONMENT

6.1 Software Requirements

Python is a high-level, interpreted programming language known for its simplicity and
readability, which makes it a popular choice for beginners as well as experienced developers.
Key features of Python include its dynamic typing, automatic memory management, and a
rich standard library that supports a wide range of applications from web development to data
science and machine learning. Its object-oriented approach and support for multiple
programming paradigms allow developers to write clear, maintainable code. Python's
extensive ecosystem of third-party packages further enhances its capabilities, enabling rapid
development and prototyping across diverse fields.

Installation

First, download the appropriate installer from the official Python website
(https://www.python.org/downloads/release/python-376/). For Windows users, run the
executable installer and ensure to check the "Add Python to PATH" option during installation;
for macOS and Linux, follow the respective package installation commands or use a package
manager like Homebrew or apt-get. After installation, verify the setup by running python --
version or python3 --version in your terminal or command prompt, which should display
"Python 3.7.6." This version-specific installation supports all major functionalities and
libraries compatible with Python 3.7.6, making it an excellent foundation for developing
robust applications in areas such as data analysis, machine learning, and GUI development.

6.1.1 Python Packages

The project requires a robust set of software libraries and tools that work together to build an
integrated system for plant disease classification. Below is an explanation of the key software
requirements and the packages used:

 Python: The project is implemented in Python, which is chosen for its extensive
ecosystem of libraries and its strong support for data analysis, machine learning, and
GUI development.
 Tkinter: Used to build the graphical user interface (GUI) of the application. It
handles tasks such as user authentication, data upload, and displaying results, making
the system accessible to both admins and end-users.

 PIL (Pillow): Utilized for image processing, particularly for handling background
images and other graphical elements within the GUI, thereby enhancing the visual
appeal of the application.

 Matplotlib & Seaborn: These libraries are employed for data visualization.
Matplotlib is used for creating standard plots, while Seaborn adds an extra layer of
sophistication for statistical visualizations such as bar plots, violin plots, histograms,
scatter plots, strip plots, and correlation heat maps.

 Pandas & NumPy: Essential for data manipulation and analysis. Pandas is used to
load, preprocess, and analyze the CSV dataset, while NumPy supports numerical
operations and data handling, which are crucial for processing large volumes of IoT
data.

 Scikit-learn (sklearn): Provides the machine learning framework used in the project.
It includes tools for model training, evaluation, train-test splitting, and data
preprocessing (like label encoding). Models such as Gaussian Naive Bayes, SVM,
KNN, and Decision Tree Classifier are implemented using scikit-learn.

 Imbalanced-learn (imblearn): Specifically used for implementing the SMOTE


(Synthetic Minority Oversampling Technique) algorithm, which helps in addressing
class imbalance in the dataset by generating synthetic samples for under-represented
classes.

 Joblib: Utilized for saving and loading trained machine learning models. This ensures
that once a model is trained, it can be stored and reused without retraining, thereby
improving efficiency.

 PyMySQL: This package provides a means to connect to a MySQL database for


handling user authentication. It facilitates operations such as user signup, login, and
data storage, ensuring secure and persistent management of user credentials.

Each of these packages plays a crucial role in ensuring that the system is robust, scalable, and
efficient—from data ingestion and preprocessing to model training, visualization, and
deployment. The combination of these tools enables the creation of an integrated, user-
friendly application for real-time plant disease classification and management.

6.2 Hardware Requirements

Python 3.7.6 can run efficiently on most modern systems with minimal hardware
requirements. However, meeting the recommended specifications ensures better performance,
especially for developers handling large-scale applications or computationally intensive tasks.
By ensuring compatibility with hardware and operating system, can leverage the full potential
of Python 3.7.6.

Processor (CPU) Requirements: Python 3.7.6 is a lightweight programming language


that can run on various processors, making it highly versatile. However, for optimal
performance, the following processor specifications are recommended:

 Minimum Requirement: 1 GHz single-core processor.

 Recommended: Dual-core or quad-core processors with a clock speed of 2 GHz or


higher. Using a multi-core processor allows Python applications, particularly those
involving multithreading or multiprocessing, to execute more efficiently.

Memory (RAM) Requirements: Python 3.7.6 does not demand excessive memory but
requires adequate RAM for smooth performance, particularly for running resource-intensive
applications such as data processing, machine learning, or web development.

 Minimum Requirement: 512 MB of RAM.

 Recommended: 4 GB or higher for general usage. For data-intensive operations, 8


GB or more is advisable.

Insufficient RAM can cause delays or crashes when handling large datasets or executing
computationally heavy programs.

Storage Requirements: Python 3.7.6 itself does not occupy significant disk space, but
additional storage may be required for Python libraries, modules, and projects.

 Minimum Requirement: 200 MB of free disk space for installation.

 Recommended: At least 1 GB of free disk space to accommodate libraries and


dependencies.
Developers using Python for large-scale projects or data science should allocate more storage
to manage virtual environments, datasets, and frameworks like TensorFlow or PyTorch.

Compatibility with Operating Systems: Python 3.7.6 is compatible with most


operating systems but requires hardware that supports the respective OS. Below are general
requirements for supported operating systems:

 Windows: 32-bit and 64-bit systems, Windows 7 or later.

 macOS: macOS 10.9 or later.

 Linux: Supports a wide range of distributions, including Ubuntu, CentOS, and


Fedora.

The hardware specifications for the OS directly impact Python’s performance, particularly for
modern software development.
CHAPTER 7

FUNCTIONAL REQUIREMENTS
The "Smart Room Occupancy Analysis" project leverages IoT sensor data to predict room
occupancy status (i.e., whether a room is occupied or unoccupied). The system employs
various machine learning algorithms, including Gradient Boosting and Naive Bayes, to
classify room occupancy based on sensor readings. It performs data analysis, preprocessing,
model training, evaluation, and prediction, making it a robust solution for smart building
applications, energy management, and automation systems.

Key Functionalities:

1. Data Loading and Analysis:

o The system begins by loading the dataset (datatest.txt) containing IoT sensor
data.

o It provides an initial inspection of the data using methods like head(), tail(),
describe(), and info() to understand the structure and statistics of the data.

o The unique values in the target variable Occupancy are identified to see the
possible classifications (i.e., "occupied" and "unoccupied").

2. Data Preprocessing:

o Missing Value Handling: The dataset is checked for missing values, and
necessary steps are taken to clean the data.

o Feature Conversion: The data columns are processed to convert categorical


variables (such as date) into appropriate numerical or datetime formats.

o Feature Engineering: A new day, month, and year columns are created from
the date column, which helps in extracting temporal features to aid the
classification.

o Data Transformation: The numeric columns are converted to appropriate


data types (e.g., float) and commas are replaced with dots in numerical values
if required.
3. Data Visualization:

o Correlation Heatmap: A heatmap is generated to visualize the correlation


matrix of features, helping to identify relationships between variables and
potential predictors.

o Count Plot: A count plot visualizes the distribution of the target variable
(Occupancy) to show the balance or imbalance between the classes (occupied
vs. unoccupied).

4. Handling Class Imbalance (SMOTE):

o The Synthetic Minority Over-sampling Technique (SMOTE) is applied to


balance the dataset by generating synthetic samples for the minority class
(unoccupied).

o A resampled dataset is created, and a count plot is generated again to visualize


the balanced distribution of the target variable.

5. Data Splitting:

o The dataset is split into training and testing sets using an 80-20 ratio, allowing
the model to be trained on a portion of the data and evaluated on unseen data.

6. Model Training and Performance Evaluation:

o Gradient Boosting Classifier: This ensemble learning method is trained on


the training data, and its performance is evaluated using metrics like accuracy,
precision, recall, and F1-score. The model is saved to a .pkl file for future use.

o Naive Bayes Classifier: The Naive Bayes model is also trained and
evaluated using the same performance metrics. The model is saved to a .pkl
file as well.

o Both classifiers are tested on the test data, and their performance is displayed
through metrics and confusion matrices.
7. Model Evaluation Metrics:

o Precision, recall, F1-score, and accuracy are calculated for each model
(Gradient Boosting and Naive Bayes). These metrics help assess the
effectiveness of the models in predicting room occupancy status.

o The system also displays confusion matrices to help visualize the true positive,
false positive, true negative, and false negative predictions for each model.

8. Predictions on New Data:

o After training and evaluating the models, the system can make predictions on
new, unseen data (test.csv).

o The new data undergoes similar preprocessing steps, including replacing


commas, converting numerical columns, and extracting date-based features.

o The model predicts the occupancy status (either "occupied" or "unoccupied")


for each row of the new data.

o The system outputs the predicted occupancy status for each instance in the
new data.
CHAPTER 8

SOURCE CODE
# IoT Data based Room Occupancy Classification Using Non Intrusive Sensors: ML Design
and Data Analysis

## Importing libraries

import numpy as np

import pandas as pd

import matplotlib.pyplot as plt

import seaborn as sns

from sklearn.preprocessing import LabelEncoder

from imblearn.over_sampling import SMOTE

from sklearn.model_selection import train_test_split

from sklearn.metrics import precision_score

from sklearn.metrics import f1_score

from sklearn.metrics import recall_score

from sklearn.metrics import accuracy_score, confusion_matrix, classification_report

import joblib

import os

from sklearn.ensemble import GradientBoostingClassifier

from sklearn.naive_bayes import GaussianNB

## Data Analysis

data = pd.read_csv(r'C:\Users\USER\Desktop\saint martins\Room Occupancy\datatest.txt')

data

data.head()
data.tail

data.describe()

data.info()

data['Occupancy'].unique()

data.columns

## Data Preprocessing

data.isnull().sum()

data.shape

data.corr()

## HeatMap

plt.figure(figsize=(15,10))

sns.heatmap(data.corr(),cmap = 'Blues',annot = True)

plt.xticks(rotation = 80)

plt.yticks(rotation = 45)

plt.show()

labels = set(data['Occupancy'])

labels

labels = ['occupied','unoccupied']

labels

## CountPlot

sns.set(style = 'darkgrid')

plt.figure(figsize = (12,6))

ax = sns.countplot(x = data['Occupancy'], palette = 'Set2')

plt.title('Count plot')
plt.xlabel('categories')

plt.ylabel('count')

plt.show()

df = pd.DataFrame(data)

# Replace commas with dots

df = df.replace(',', '.', regex=True)

# Convert columns to numeric types (ignore 'date' for now)

numeric_columns = df.columns[1:] # exclude 'date' column

df[numeric_columns] = df[numeric_columns].apply(pd.to_numeric, errors='coerce')

# Convert 'date' column to datetime type

df['date'] = pd.to_datetime(df['date'])

df['day'] = df['date'].dt.day

df['month'] = df['date'].dt.month

df['year'] = df['date'].dt.year

df.drop(columns=['date'], inplace=True)

print(df.dtypes) # Check the data types of the columns

print(df.head())

df

x = df.drop(['Occupancy'],axis = 1)
x

y = df['Occupancy']

df = SMOTE(random_state=42)

x_resampled, y_resampled = df.fit_resample(x, y)

df

sns.set(style = 'darkgrid')

plt.figure(figsize = (12,6))

ax = sns.countplot(x = y_resampled, palette = 'Set2')

plt.title('Count plot')

plt.xlabel('categories')

plt.ylabel('count')

plt.show()

## Data Splitting

x_train, x_test, y_train, y_test = train_test_split(x_resampled, y_resampled, test_size = 0.20,


random_state = 42)

x_train

y_train

x_test

y_test

x_train.shape

y_train.shape

## Performance Evaluation

precision = []

recall = []
fscore = []

accuracy = []

def performance_metrics(algorithm, predict, testY):

testY = testY.astype('int')

predict = predict.astype('int')

p = precision_score(testY, predict,average='macro') * 100

r = recall_score(testY, predict,average='macro') * 100

f = f1_score(testY, predict,average='macro') * 100

a = accuracy_score(testY,predict)*100

accuracy.append(a)

precision.append(p)

recall.append(r)

fscore.append(f)

print(algorithm+' Accuracy : '+str(a))

print(algorithm+' Precision : '+str(p))

print(algorithm+' Recall : '+str(r))

print(algorithm+' FSCORE : '+str(f))

report=classification_report(predict, testY,target_names=labels)

print('\n',algorithm+" classification report\n",report)

conf_matrix = confusion_matrix(testY, predict)

plt.figure(figsize =(5, 5))

ax = sns.heatmap(conf_matrix, xticklabels = labels, yticklabels = labels, annot = True,


cmap="Blues" ,fmt ="g");
ax.set_ylim([0,len(labels)])

plt.title(algorithm+" Confusion matrix")

plt.ylabel('True class')

plt.xlabel('Predicted class')

plt.show()

## GradientBoost Algorithm

if os.path.exists('GradientBoost_weights.pkl'):

# Load the model from the pkl file

classifier = joblib.load('GradientBoost_weights.pkl')

predict = classifier.predict(x_test)

performance_metrics("Gradient Boosting Classifier", predict, y_test)

else:

# Train the classifier on the training data

classifier = GradientBoostingClassifier(n_estimators=100, learning_rate=0.1,


max_depth=2, random_state=43)

classifier.fit(x_train, y_train)

# Make predictions on the test data

predict = classifier.predict(x_test)

# Save the model weights to a pkl file

joblib.dump(classifier, 'GradientBoost_weights.pkl')

print("Gradient Boosting classifier model trained and model weights saved.")

performance_metrics("Gradient Boosting Classifier", predict, y_test)

## NaiveBayes Algorithm

if os.path.exists('NaiveBayes_weights.pkl'):

# Load the model from the pkl file


classifier = joblib.load('NaiveBayes_weights.pkl')

else:

# Train the classifier on the training data

classifier = GaussianNB()

classifier.fit(x_train, y_train)

# Save the model weights to a pkl file

joblib.dump(classifier, 'NaiveBayes_weights.pkl')

print("Naive Bayes classifier model trained and model weights saved.")

predict = classifier.predict(x_test)

performance_metrics("Naive Bayes Classifier", predict, y_test)

test = pd.read_csv(r"test.csv")

test

Date_convert = pd.DataFrame(test)

# Replace commas with dots

Date_convert = Date_convert.replace(',', '.', regex=True)

# Convert columns to numeric types (ignore 'date' for now)

numeric_columns = test.columns[1:] # exclude 'date' column

test[numeric_columns] = test[numeric_columns].apply(pd.to_numeric, errors='coerce')


# Convert 'date' column to datetime type

Date_convert['date'] = pd.to_datetime(Date_convert['date'])

Date_convert['day'] = Date_convert['date'].dt.day

Date_convert['month'] = Date_convert['date'].dt.month

Date_convert['year'] = Date_convert['date'].dt.year

Date_convert.drop(columns=['date'], inplace=True)

print(Date_convert.dtypes) # Check the data types of the columns

print(Date_convert.head())

Date_convert

predict = classifier.predict(Date_convert)

predict

A='occupied'

B='unoccupied'

#test = pd.read_csv(r'test.csv')

predict = classifier.predict(Date_convert)

for i in range(len(predict)):

if predict[i] == 0:

print("{} :{} ".format(Date_convert.iloc[i,:],A))

elif predict[i]== 1:

print("{} :{} ".format(Date_convert.iloc[i, :],B))


CHAPTER 9
RESULTS AND DISCUSSION
9.1 Implementation Description

This project implements a machine learning-based approach for room occupancy


classification using IoT data collected from non-intrusive sensors. The primary objective is to
accurately predict whether a room is occupied or unoccupied by analyzing various
environmental parameters like temperature, humidity, light, and CO2 levels.

1. Library Imports

The project begins by importing essential libraries for data manipulation, visualization,
machine learning, and performance evaluation. These libraries include numpy, pandas,
matplotlib, seaborn, and scikit-learn. Additionally, imblearn is used for handling class
imbalance through the SMOTE technique, and joblib is used for saving and loading trained
models.

2. Data Analysis

The dataset is loaded from a .txt file containing the sensor readings. The initial steps involve
displaying the first few records using head() and analyzing the overall structure and summary
statistics with describe() and info(). This analysis provides insight into the data distribution
and helps identify any potential issues such as missing values.

3. Data Preprocessing

To ensure data quality and consistency, missing values are checked using isnull().sum(), and
the dataset's shape is verified. The correlation between features is visualized through a
heatmap generated by seaborn, providing a clear picture of how the features relate to each
other.

The data is further preprocessed by handling the 'date' column, where it is split into day,
month, and year components to facilitate better analysis. The remaining columns are
converted to numeric types, ensuring the data is ready for machine learning algorithms.
4. Class Imbalance Handling

Given the imbalance in the occupancy classes (occupied vs. unoccupied), the Synthetic
Minority Over-sampling Technique (SMOTE) is applied to generate a balanced dataset. This
step is crucial for preventing the model from being biased toward the majority class.

5. Data Visualization

The distribution of the target variable, 'Occupancy', is visualized using a count plot to
understand the class imbalance before and after applying SMOTE. This visualization helps in
confirming the effectiveness of the resampling technique.

6. Data Splitting

The preprocessed and balanced data is then split into training and testing sets using an 80-20
split. This split allows the model to be trained on a significant portion of the data while
retaining a test set for evaluating the model's generalization ability.

7. Performance Evaluation Metrics

A custom function, performance_metrics, is defined to calculate and display key performance


metrics, including precision, recall, F1-score, and accuracy. The function also generates a
classification report and a confusion matrix to provide a detailed evaluation of the model's
predictions.

8. Gradient Boosting Classifier

The Gradient Boosting Classifier, a powerful ensemble learning algorithm, is used as one of
the models. If a pre-trained model is available, it is loaded; otherwise, the classifier is trained
on the training set. The model's predictions on the test set are then evaluated using the
defined performance metrics, and the model is saved for future use.

9. Naive Bayes Classifier

The Naive Bayes algorithm, a simpler yet effective classification technique, is also
implemented. Similar to the Gradient Boosting Classifier, the model is either loaded from a
file or trained from scratch. Its performance is assessed using the same metrics, providing a
comparison between a simple and a more complex model.
10. Testing on New Data

A separate dataset is loaded and preprocessed to test the model's performance on unseen data.
The model predicts the occupancy status for each record, and the results are displayed,
indicating whether the room is occupied or unoccupied
9.2 Dataset Description

The dataset used in this project for room occupancy classification is collected from non-
intrusive IoT sensors deployed in a smart building environment. The dataset captures
environmental conditions that influence room occupancy status (occupied/unoccupied), such
as temperature, humidity, light, CO2 levels, and humidity ratios. The key details and
attributes of the dataset are as follows:

1. Attributes:

The dataset consists of the following features, which serve as input for the machine learning
models:

Date: (Initially present, later split into day, month, year) - This column contains the
timestamp of when the data was recorded. It was later split into separate day, month, and year
columns for better analysis.

Temperature (in Celsius): Reflects the current temperature of the room at the time of the data
recording.

Humidity (%): The percentage of relative humidity in the room.

Light (Lux): Measures the light intensity in the room (in lux units).

CO2 (ppm): The carbon dioxide concentration (in parts per million) in the room.

Humidity Ratio: The calculated humidity ratio based on room conditions.

Occupancy: The target variable, which indicates whether the room is occupied
(represented as 1) or unoccupied (represented as 0).

2. Dataset Structure:

The dataset consists of several thousand rows, each representing a sensor reading taken at a
specific timestamp.

The target variable (Occupancy) has an imbalanced distribution, with more instances of the
room being unoccupied than occupied. This imbalance is addressed using the SMOTE
technique during data preprocessing.
3. Data Collection Environment:

The data was collected from a controlled indoor environment where IoT sensors were
deployed to monitor the room's conditions in real time. The non-intrusive nature of the
sensors ensured that they did not interfere with human activities while capturing crucial
environmental data.

4. Data Preprocessing:

The dataset underwent several preprocessing steps to ensure its quality:

The 'date' column was split into day, month, and year columns to derive temporal features.

Missing values were checked, and the numeric columns were validated by replacing commas
with dots to maintain a consistent data format.

The dataset was balanced using SMOTE to address the class imbalance problem in the
Occupancy column.

5. Dataset Statistics:

Total Records: Thousands of rows of sensor data.

Features: 7 (after processing), including temperature, humidity, light, CO2, humidity ratio,
and the target label Occupancy.

Occupancy Distribution: Before applying SMOTE, the dataset exhibited an imbalance,


with more records showing unoccupied rooms compared to occupied ones.

6. Challenges in the Dataset:

Class Imbalance: As mentioned, the dataset had a disproportionate number of instances


where the room was unoccupied. This challenge was addressed using SMOTE to generate
synthetic samples of the minority class.

Correlated Features: Environmental factors like temperature, humidity, and CO2 are
often correlated. A heatmap was generated to visualize these correlations and guide the
selection of the most relevant features for the model.
9.3 Results Description

The results obtained from applying the Gradient Boosting Classifier and Naive Bayes
Classifier to the room occupancy dataset for IoT-based room occupancy detection are
summarized below. These results illustrate the models' performance in classifying room
occupancy (occupied or unoccupied), based on various performance metrics including
accuracy, precision, recall, and F1-score. Additionally, visualizations such as confusion
matrices and the class distribution before and after SMOTE (Synthetic Minority Over-
sampling Technique) highlight critical insights into both model performance and dataset
characteristics.

Figure 9.3A: Dataset Overview

This figure displays the IoT-based room occupancy dataset, showcasing the initial few rows.
It presents the structure and organization of environmental data collected from sensors,
including features such as temperature, humidity, light, CO2 levels, and humidity ratio. The
target variable Occupancy categorizes whether a room is occupied (1) or unoccupied (0).
Figure 9.3B: Dataset Correlation Heatmap

The correlation heatmap visualizes the relationships between various features in the dataset.
Each cell in the heatmap displays the correlation value between two features, with values
ranging from -1 to 1. A positive value indicates a positive correlation, while a negative value
indicates an inverse relationship. This heatmap highlights significant feature relationships,
particularly between temperature, humidity, and CO2 levels, which play a key role in
predicting room occupancy. These strong correlations provide insight into the underlying
environmental factors that influence occupancy and improve the performance of machine
learning models.
Figure 9.3C: Class Distribution of Occupancy (Before and After SMOTE)

This figure illustrates the class imbalance within the Occupancy variable before and after
applying SMOTE.

Pre-SMOTE: The count plot before SMOTE reveals an imbalance in the dataset, with more
instances of unoccupied rooms compared to occupied ones. This imbalance could skew the
model's predictions toward the majority class, leading to poor performance in detecting
occupied rooms.
Post-SMOTE: After applying SMOTE, the class distribution is balanced, ensuring that the
machine learning models have equal representation of both occupancy classes during
training. This step significantly enhances the classifiers' ability to fairly detect both occupied
and unoccupied rooms, thereby improving model accuracy and reliability.

Figure 9.3D: Performance Metrics for Gradient Boosting Classifier

The confusion matrix and classification report for the Gradient Boosting Classifier
summarize the model’s performance on the test dataset. The confusion matrix contains the
following key elements:

True Positives (TP): Correctly classified instances of occupied rooms.

True Negatives (TN): Correctly classified instances of unoccupied rooms.

False Positives (FP): Instances incorrectly classified as occupied when they were
actually unoccupied.
False Negatives (FN): Instances incorrectly classified as unoccupied when they were
actually occupied.

The Gradient Boosting Classifier shows high performance, with a strong balance of accuracy,
precision, recall, and F1-score. These metrics indicate that the classifier is effective at
distinguishing between occupied and unoccupied rooms, with minimal errors in
classification.

Figure 9.3F: Performance Metrics for Naive Bayes Classifier

Similar to the Gradient Boosting Classifier, this figure displays the confusion matrix and
classification report for the Naive Bayes Classifier. By comparing the True Positives, True
Negatives, False Positives, and False Negatives, the confusion matrix highlights the model's
prediction strengths and weaknesses.
The Naive Bayes Classifier demonstrates competitive performance in terms of accuracy,
precision, and recall, particularly in recognizing unoccupied rooms. The classification report
breaks down the performance by providing additional metrics, such as the F1-score, for both
occupied and unoccupied room states. While the Naive Bayes Classifier performs well
overall, it is slightly outperformed by the Gradient Boosting Classifier in terms of precision
and recall.

Figure 9.3G: Prediction on Test Data

This figure illustrates the predictions made by both classifiers on unseen test data. The
predicted room occupancy statuses are compared with the actual labels, providing an
assessment of the models' ability to generalize to new data. By analyzing the differences
between predicted and actual values, it is evident that both classifiers provide strong results,
indicating their effectiveness in real-world scenarios for predicting room occupancy based on
IoT sensor data.

Both models demonstrate high accuracy and reliability in classifying room occupancy, with
Gradient Boosting providing marginally better performance compared to Naive Bayes. The
models’ predictions are consistent with the dataset characteristics, suggesting their
applicability for automated room occupancy detection systems in smart environments.
CHAPTER 10

CONCLUSION AND FUTURE SCOPE

10.1 Conclusion

The implementation of an IoT-based room occupancy classification system using non-


intrusive sensors and machine learning represents a significant advancement in the field of
smart auditorium management. This project effectively addresses the limitations of traditional
manual methods by leveraging modern data-driven approaches to accurately and efficiently
determine room occupancy. Through the use of machine learning algorithms, specifically
Gradient Boosting and Naive Bayes classifiers, the system has demonstrated its ability to
handle complex, voluminous data generated by IoT sensors, offering real-time and accurate
occupancy predictions.

The data preprocessing phase, which involved data cleaning, encoding, and balancing
through SMOTE, ensured that the input data was of high quality, enhancing the model's
performance. The comparative analysis between the existing Naive Bayes algorithm and the
proposed Gradient Boosting classifier highlighted the superiority of the latter in terms of
accuracy, precision, recall, and F1-score. The Gradient Boosting classifier, with its ability to
model complex relationships and reduce overfitting, provided more reliable and consistent
predictions.

The successful deployment of this system demonstrates the potential of integrating IoT with
machine learning to create smart, responsive environments that can optimize resource usage,
improve energy efficiency, and enhance user experiences in auditoriums and similar settings.

10.2 Future Scope

While the current implementation provides a robust foundation for room occupancy
classification, several avenues can be explored to further enhance and extend the system:

Scalability and Real-Time Processing: As the system scales to cover larger areas or
multiple rooms, the need for real-time data processing becomes crucial. Implementing
distributed computing techniques and edge computing can reduce latency and improve the
system's responsiveness.
Incorporation of Advanced Sensors: The integration of additional sensors, such as
infrared, ultrasonic, or CO2 sensors, could improve the accuracy of occupancy detection.
These sensors can capture different aspects of human presence, further refining the
classification results.

Integration with Building Management Systems (BMS): Connecting the


occupancy classification system with existing BMS could automate lighting, HVAC systems,
and other environmental controls, leading to significant energy savings and enhanced user
comfort.

Machine Learning Model Optimization: Exploring other advanced machine


learning models, such as deep learning techniques, could further improve the accuracy and
robustness of the system. Techniques like neural networks, recurrent neural networks
(RNNs), or ensemble learning methods can be considered for future implementations.

User Behaviour Analysis: Beyond occupancy detection, analyzing patterns of user


behavior within the room could provide insights into space utilization, allowing for more
efficient scheduling and resource management.

Security and Privacy Concerns: As IoT devices become more pervasive, addressing
security and privacy concerns is paramount. Implementing secure communication protocols
and anonymizing data could protect user privacy while maintaining the system's functionality.

Deployment in Diverse Environments: Expanding the system's deployment beyond


auditoriums to other settings like offices, classrooms, or residential buildings could
demonstrate its versatility and adaptability. Each environment may present unique challenges,
requiring customized solutions.

Longitudinal Studies: Conducting longitudinal studies to evaluate the system's


performance over time, under different conditions, and with varying occupancy levels could
provide valuable feedback for continuous improvement.

User Interface and Accessibility: Developing a user-friendly interface that provides


real-time occupancy data, historical trends, and system diagnostics could enhance the
system's usability. Additionally, making the system accessible to individuals with disabilities
could further broaden its applicability.
REFERENCES
[1] Conte, G.; Marchi, M.D.; Nacci, A.A.; Rana, V.; Sciuto, D. BlueSentinel: A first approach
using iBeacon for an energy efficient occupancy detection system. In Proceedings of the
ACM, Memphis, TN, USA, 3–6 November 2014. [Google Scholar]
[2] Corna, A.; Fontana, L.; Nacci, A.A.; Sciuto, D. Occupancy Detection via iBeacon on
Android Devices for Smart Building Management. In Proceedings of the Design, Automation
and Test in Europe Conference and Exhibition, Grenoble, France, 9–13 March 2015. [Google
Scholar]
[3] Filippoupolitis, A.; Oliff, W.; Loukas, G. Bluetooth Low Energy Based Occupancy
Detection for Emergency Management. In Proceedings of the 2016 15th International
Conference on Ubiquitous Computing and Communications and 2016 International
Symposium on Cyberspace and Security (IUCC-CSS), Granada, Spain, 14–16 December
2016. [Google Scholar]
[4] Mohottige, I.P.; Gharakheili, H.H.; Sivaraman, V.; Moors, T. Modeling Classroom
Occupancy using Data of WiFi Infrastructure in a University Campus. IEEE Sens. J. 2022,
22, 9981–9996. [Google Scholar] [CrossRef]
[5] Choi, H.; Fujimoto, M.; Matsui, T.; Misaki, S.; Yasumoto, K. Wi-CaL: WiFi sensing and
machine learning based device-free crowd counting and localization. IEEE Access 2022, 10,
24395–24410. [Google Scholar] [CrossRef]
[6] Lu, X.; Wen, H.; Han, Z.; Hao, J.; Xie, L.; Trigoni, N. Robust occupancy inference with
commodity WiFi. In Proceedings of the IEEE International Conference on Wireless & Mobile
Computing, New York, NY, USA, 17–19 October 2016. [Google Scholar]
[7] U+Wang, W.; Chen, J.; Hong, T.; Zhu, N. Occupancy prediction through Markov based
feedback recurrent neural network (M-FRNN) algorithm with WiFi probe technology. Build.
Environ. 2018, 138, 160–170. [Google Scholar] [CrossRef]
[8] Zou, H.; Jiang, H.; Yang, J.; Xie, L.; Spanos, C. Non-intrusive occupancy sensing in
commercial buildings. Energy Build. 2017, 154, 633–643. [Google Scholar] [CrossRef]
[9] Mohottige, I.P.; Moors, T. Estimating Room Occupancy in a Smart Campus using WiFi
Soft Sensors. In Proceedings of the 2018 IEEE 43rd Conference on Local Computer
Networks (LCN), Chicago, IL, USA, 1–4 October 2018. [Google Scholar]
[10] Wang, W.; Chen, J.; Hong, T. Occupancy prediction through machine learning and data
fusion of environmental sensing and Wi-Fi sensing in buildings. Autom. Constr. 2018, 94,
233–243. [Google Scholar] [CrossRef]
[11] Marmaroli, P.; Allado, M.; Boulandet, R. Towards the detection and classification of
indoor events using a loudspeaker. Appl. Acoust. 2023, 202, 109161. [Google Scholar]
[CrossRef]
[12] Singh, A.P.; Jain, V.; Chaudhari, S.; Kraemer, F.A.; Garg, V. Machine Learning-Based
Occupancy Estimation Using Multivariate Sensor Nodes. In Proceedings of the 2018 IEEE
Globecom Workshops (GC Wkshps), Abu Dhabi, United Arab Emirates, 9–13 December
2018. [Google Scholar]
[13] Padmanabh, K.; Malikarjuna, V.A.; Sen, S.; Katru, S.P.; Paul, S. iSense: A wireless
sensor network based conference room management system. In Proceedings of the ACM,
Berkeley, CA, USA, 3 November 2009. [Google Scholar]
[14] Zheng, Y.; Becerik-Gerber, B. Cross-Space Building Occupancy Modeling by
Contextual Information Based Learning. In Proceedings of the ACM, Seoul, Republic of
Korea, 4–5 November 2015. [Google Scholar]
[15] Huang, Q.; Ge, Z.; Lu, C. Occupancy estimation in smart buildings using audio-
processing techniques. arXiv 2016, arXiv:1602.08507. [Google Scholar]
[16] Dino, I.G.; Kalfaoglu, E.; Iseri, O.K.; Erdogan, B.; Kalkan, S.; Alatan, A.A. Vision-
based estimation of the number of occupants using video cameras. Adv. Eng. Inform. 2022,
53, 101662. [Google Scholar] [CrossRef]
[17] Sun, K.; Liu, P.; Xing, T.; Zhao, Q.; Wang, X. A fusion framework for vision-based
indoor occupancy estimation. Build. Environ. 2022, 225, 109631. [Google Scholar]
[CrossRef]
[18] Fleuret, F.; Berclaz, J.; Lengagne, R.; Fua, P. Multicamera people tracking with a
probabilistic occupancy map. IEEE Trans. Pattern Anal. Mach. Intell. 2008, 30, 267. [Google
Scholar] [CrossRef] [PubMed]
[19] Zou, J.; Zhao, Q.; Yang, W.; Wang, F. Occupancy detection in the office by analyzing
surveillance videos and its application to building energy conservation. Energy Build. 2017,
152, 385–398. [Google Scholar] [CrossRef]
[20] Petersen, S.; Pedersen, T.H.; Nielsen, K.U.; Knudsen, M.D. Establishing an image-based
ground truth for validation of sensor data-based room occupancy detection. Energy Build.
2016, 130, 787–793. [Google Scholar] [CrossRef]

You might also like