0% found this document useful (0 votes)

20 views61 pages

Wnew Project

Uploaded by

Sokunbi Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

20 views61 pages

Wnew Project

Uploaded by

Sokunbi Daniel

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as DOCX, PDF, TXT or read online on Scribd

You are on page 1/ 61

Project Report Title

Subtitle if required

Your Name

Submitted to
The University of Roehampton

In partial fulfilment of the requirements

for the degree of

Master of Science

Computing /Data Science /Web Development

Declaration
I hereby certify that this report constitutes my own work, that where the language of others is used,
quotation marks so indicate, and that appropriate credit is given where I have used the language,
ideas, expressions, or writings of others.

I declare that this report describes the original work that has not been previously presented for the
award of any other degree of any other institution.

Enter your name here

Enter the date here

Apply your signature here

ii
Acknowledgements
Here, it is customary to thank the people who have supported this work and your studies in general. It is up
to you who you thank!

iii
Abstract
The role of data in modern business decision-making processes has become increasingly significant,
particularly in areas such as pricing strategies, stock management, and the identification of unsuccessful
products. Traditionally, big data has been heralded as a transformative resource, offering businesses the
ability to analyze vast amounts of information to make informed decisions. However, the practical
application of big data in smaller-scale projects, such as the one presented in this dissertation, faces
significant challenges due to limitations in computational resources and access to extensive datasets. This
study investigates these challenges and explores how similar methodologies can be applied to smaller
datasets, specifically within the context of a retail environment, using the "Online Retail" dataset available
on Kaggle.

The research begins with a comprehensive literature review that underscores the importance of data-driven
decision-making in business. It highlights the potential of big data in optimizing pricing models, improving
stock management through predictive analytics, and accurately identifying underperforming products.
Despite the recognized advantages of big data, the literature also points out the substantial challenges
associated with its use, particularly the need for advanced computational resources and the difficulty in
accessing and managing large, distributed datasets. These challenges informed the decision to shift the focus
of this project from big data to a smaller dataset, allowing for a detailed exploration of similar methodologies
under more constrained conditions.

The methodology section of this dissertation outlines the approach taken to address the research objectives
using the "Online Retail" dataset. The dataset, though smaller in scale, is rich in transactional data, providing
a suitable testbed for the application of data-driven techniques in a business context. Key methodologies
include regression analysis for pricing strategy optimization, time series forecasting for stock management,
and classification techniques for identifying unsuccessful products. The study employs various data
preprocessing techniques, including handling missing values, outlier detection, and feature engineering, to
prepare the dataset for analysis.

The implementation chapter delves into the technical details of applying these methodologies. A Ridge
Regression model was utilized to predict sales based on product features and stock levels, offering insights
into how businesses can optimize pricing strategies. The model’s performance was evaluated using metrics
such as R² and Mean Absolute Error (MAE), demonstrating that even with a smaller dataset, meaningful
insights can be derived. Similarly, a Random Forest Classifier was employed to identify low-performing
products. Despite the challenges posed by an imbalanced dataset, the model achieved reasonable accuracy,
highlighting the potential of classification techniques in guiding product management decisions.

In the evaluation and results chapter, the strengths and weaknesses of the implemented models are discussed
in detail. The regression analysis provided practical insights into the relationship between product quantity
and sales, though the limited feature set restricted the model's ability to capture more complex dynamics. The

iv
classification analysis was similarly constrained by the simplified feature set and the inherent limitations of
working with a smaller dataset. However, the study successfully demonstrates that with appropriate feature
engineering and model selection, valuable business insights can be obtained even under resource constraints.

The conclusion of this dissertation reflects on the implications of the findings for both academia and
industry. It underscores the necessity of adapting data-driven methodologies to the specific constraints of a
project, particularly when working with smaller datasets. The study suggests several avenues for future
research, including the application of more advanced modeling techniques, the exploration of larger datasets,
and the integration of real-time data to enhance decision-making processes. Additionally, a critical reflection
on the project process highlights the lessons learned and the areas where improvements could be made in
future work.

v
Table of Contents
Declaration ------------------------------------------------------------------------------------------ ii

Acknowledgements ------------------------------------------------------------------------------------------ iii

Abstract ------------------------------------------------------------------------------------------ iv

Table of Contents ------------------------------------------------------------------------------------------ vi

List of Figures ------------------------------------------------------------------------------------------viii

List of Tables ------------------------------------------------------------------------------------------ ix

Chapter 1 Introduction ----------------------------------------------------------------------------------------

1.1 Problem Description, Context and Motivation ---------------------------------------------------

1.2 Objectives ------------------------------------------------------------------------------------------

1.3 Methodology ------------------------------------------------------------------------------------------

1.4 Legal, Social, Ethical and Professional Considerations ------------------------------------------

1.5 Background ------------------------------------------------------------------------------------------

1.6 Structure of Report -----------------------------------------------------------------------------------

Chapter 2 Literature – Technology Review ----------------------------------------------------------------

2.1 Literature Review -------------------------------------------------------------------------------------

2.2 Technology Review ----------------------------------------------------------------------------------

2.3 Summary ----------------------------------------------------------------------------------------------

Chapter 3 Implementation ------------------------------------------------------------------------------------ 7

Chapter 4 Evaluation and Results -------------------------------------------------------------------------- 8

4.1 Related Works ----------------------------------------------------------------------------------------- 8

Chapter 5 Conclusion ----------------------------------------------------------------------------------------- 9

vi
5.1 Future Work ------------------------------------------------------------------------------------------

5.2 Reflection -------------------------------------------------------------------------------------------

References ------------------------------------------------------------------------------------------------1

Appendices -------------------------------------------------------------------------------------------------I

Appendix A: Project Proposal ------------------------------------------------------------------------------II

Appendix B: Project Management ----------------------------------------------------------------------- III

Appendix C: Artefact/Dataset ------------------------------------------------------------------------------IV

Appendix D: Screencast -------------------------------------------------------------------------------------V

vii
List of Figures

FIGURE 1 ------------------------------------------------------------------------------------------ 26

FIGURE 2.1-----------------------------------------------------------------------------------------33

FIGURE 3.1----------------------------------------------------------------------------------------- 39

FIGURE 3.2 -----------------------------------------------------------------------------------------40

FIGURE 3.3----------------------------------------------------------------------------------------- 41

FIGURE 3.4 ----------------------------------------------------------------------------------------- 43

FIGURE 4.1----------------------------------------------------------------------------------------- 49

FIGURE 4.2 -----------------------------------------------------------------------------------------49

FIGURE 4.3----------------------------------------------------------------------------------------- 50

viii
List of Tables

TABLE 1.1 ----------------------------------------------------------------------------------------- 19

TABLE 1.2 ----------------------------------------------------------------------------------------- 27

TABLE 2.1 ----------------------------------------------------------------------------------------- 32

TABLE 3.1 ----------------------------------------------------------------------------------------- 43

FIGURE 4.1 ---------------------------------------------------------------------------------------- 49

FIGURE 4.2 ----------------------------------------------------------------------------------------- 49

FIGURE 4.3 ----------------------------------------------------------------------------------------- 50

ix
1
Chapter 1 Introduction

Information systems have evolved over the years from being transactions recording system to

supporting business decisions at different levels. Traditional information systems depended

primarily on internal data sources such as enterprise resource planning systems (ERPs) for making

business decisions. These datasets ire structured and used relational database management system

(RDBMS). These ire used for supporting internal business decisions such as inventory management,

pricing decisions, finding out most valuable customers, identifying loss making products etc.

Besides, data warehouse was built using this data for analysis and mining purpose. These data

sources ire integrated with data from business partners such as suppliers and customers using

enterprise application integration (EAI) platforms. EAI enabled seamless integration of information

systems between business partners. It enhanced speed of business to business transactions (B2B),

communication and reduced cost of inter-company transactions. In the next wave in early nineties,

arrival of internet further simplified integration of firms with their business partners. In the last

decade, information systems coupled with internet, cloud computing, mobile devices and Internet of

Things have led to massive volumes of data, commonly referred as big data. It includes structured,

semi-structured and unstructured real-time data, constituting of data warehouse, OLAP, ETL and

information. Computer science has advanced to store and process large volumes of diverse datasets

using statistical techniques. Business firms and academicians have designed unique ways of tapping

value from big data. The objective of this paper is to explore the role of big data in making better

decisions and how big data can be used to make smart and real-time decisions for improving

business results. The revolution of big data is more powerful than the analytics which ire used in the

past. Using big data helps managers to make better decisions on the basis of evidences rather than

intuition. Businesses are collecting more data than required for any use (McAfee et al., 2012); big

data helps in making better predictions and smarter decisions. Leaders across industries use big data

for better managerial practices.

With the explosive growth of the internet over the last 20 years and the information available on

websites, users, advertisers, and other businesses have a far greater knowledge of their customers.

Out of the explosion of data available on consumers comes the concept of big data, where data from

2
multiple sources can be analyzed to detect patterns and make predictions, allowing for a far greater

understanding of individual customers. As impressive as all these applications are, there has been

very little focus on how companies can use big data to improve their internal decision-making.

In this paper, i report on three separate consultations in a wide range of industries to explore how

companies use internal factual data to make decisions and identify gaps where current

methodologies could be improved with the use of big data. The three consultations ire with

companies in the stock management, product development, and pricing areas.

The responses to the consultations display the novel options that big data could provide to

businesses if the data ire more readily available to the business users. Of interest was that, other

than one exception, only internal data and expertise ire used for decision-making and not external

consultants or industry surveys. This is a possibly surprising outcome given the wide range of

external industry information available to these companies compared to the lack of sophisticated

data sources for their specific company needs.

There are several researches conducted in individual areas such as transactional data, social media

data, supply chain big data etc. However, there is lack of holistic review of understanding potential

of big data for decision makers. Driven by this need i explore the role of variety of big data in

various decision-making scenarios. This paper acts as a bridge this gap by achieving the following

objectives: a) To explore the existing literature on the fundamental concepts of big data and its role

in decision making b) To explore role of big data in making strategic, tactical and operational

decisions. The study is useful for making important decisions with the help of big data. In the

present era, big data has been used Jeble et al.: Role of Big Data in Decision Making Operations

sectors. This has led to make better predictions and better decisions. In the next section, I review

extant literature on big data and how it is gaining significance for business and society. Here i have

reviewed several definitions of big data from leading big data and analytics professionals. i also

touch upon different ways in which applications of analytics can be classified. Third section

3
discusses various applications and benefits of big data. Here i review how different institutions such

as banks or business firms have been able to collect, analyze and use big data for enhancing their

business performance. Role of Analytics based decision making using big data is nothing new for

some of the leading companies. However, there are still many small and medium size companies

which can start taking advantage of this emerging field. In the fourth section, i present a framework

on big data that can be used by such companies. This framework could be a starting point to refine

the model suitable for their businesses. Finally, in the last section i concluded the study with my

findings and suggest future research directions.

1.1 Problem Description, Context and Motivation

The study was initially designed to utilize big data to explore how companies can optimize their

pricing, stock management, and product success identification processes. However, due to the

limitations in accessing and processing big data—such as the lack of powerful computational

resources and restricted access to multiple databases—the research has been narrowed to focus on

smaller datasets. This shift in focus aims to demonstrate that valuable insights can still be obtained

using smaller datasets, employing the same methodologies as those used in big data analytics.

1.2 Objectives

 To analyze how big data influences pricing decisions within a company.

 To evaluate the impact of big data on stock management strategies.

 To assess the role of big data in identifying unsuccessful products.

1.3 Methodology

This project employs a mixed-methods approach, which integrates both quantitative and qualitative

research methods to comprehensively address the research objectives. The methodology is

structured to leverage the strengths of each approach, ensuring a robust and ill-rounded analysis.

Quantitative Data Analysis

4
Quantitative data analysis involves the use of statistical and mathematical techniques to analyze

numerical data. The steps in this approach are as follows:

1. Data Collection:

- Sources: The data will be collected from industry reports, financial databases, company records,

and academic journals. These sources provide reliable and extensive datasets on pricing, stock

levels, and product performance.

- Types of Data: The data includes sales figures, pricing history, inventory records, and customer

feedback metrics.

2. Data Preprocessing:

- Cleaning: The raw data will be cleaned to remove any inconsistencies, missing values, or

outliers that could skew the analysis.

- Normalization: Data normalization techniques will be applied to standardize the data, making it

suitable for analysis.

3. Analytical Tools and Techniques:

- Software: Advanced analytical tools such as Python, R, and SQL will be used for data

processing and analysis.

- Statistical Analysis: Techniques such as regression analysis, time series analysis, and clustering

will be employed to identify patterns and correlations within the data.

- Machine Learning Models: Predictive models like decision trees, random forests, and neural

networks will be used to forecast trends and make data-driven decisions.

4. Data Visualization:

5
- Tools: Visualization tools such as Tableau and Power BI will be used to create graphs, charts,

and dashboards that present the data in an easily interpretable format.

- Purpose: These visualizations will help in identifying trends, anomalies, and patterns in the data,

facilitating better decision-making.

Qualitative Case Studies

Qualitative research complements quantitative analysis by providing contextual insights and deeper

understanding of the phenomena under study. The steps in this approach are as follows:

1. Case Selection:

- Criteria: Cases will be selected based on their relevance to the research objectives, availability of

data, and diversity in terms of industry and geographical location.

- Sources: Company case studies, interviews with industry experts, and internal company reports

will be used.

2. Data Collection:

- Interviews: Semi-structured interviews will be conducted with key stakeholders such as

managers, data analysts, and decision-makers within the company.

- Document Analysis: Internal documents, reports, and meeting minutes will be analyzed to

understand the decision-making processes related to pricing, stock management, and product

identification.

3. Data Analysis:

- Coding: Qualitative data will be coded to identify recurring themes and patterns.

6
- Thematic Analysis: Themes related to the impact of big data on decision-making processes will

be identified and analyzed.

4. Triangulation:

- Integration: The findings from the quantitative analysis will be integrated with the qualitative

insights to provide a comprehensive understanding of the role of big data in internal decision-

making.

- Validation: Triangulation helps in validating the results by cross-verifying data from multiple

sources and methods.

Tools and Technologies

- Data Processing Frameworks: Hadoop and Spark for handling large datasets.

- Statistical Software: R and Python for conducting advanced statistical analyses.

- Machine Learning Platforms: TensorFlow and Scikit-Learn for building and deploying machine

learning models.

- Visualization Software: Tableau and Power BI for creating interactive and insightful

visualizations.

Rationale for Method Selection

- Mixed-Methods Approach: Combining quantitative and qualitative methods provides a holistic

view, capturing both numerical trends and contextual nuances.

- Analytical Tools: The selected tools and techniques are industry-standard and offer robust

capabilities for handling big data and deriving actionable insights.

- Triangulation: Ensures the reliability and validity of the findings by corroborating evidence from

different sources and methods.

7
1.4 Legal, Social, Ethical and Professional Considerations

Data Privacy:

1. Compliance: The project must comply with data protection laws such as the General

Data Protection Regulation (GDPR) in the EU or the California Consumer Privacy

Act (CCPA) in the US. These regulations mandate how personal data should be

collected, processed, and stored.

2. Consent: Ensuring that data used in the project has been collected with proper

consent from individuals. Any personal data must be anonymized to protect

individual privacy.

Data Security:

1. Protection Measures: Implementing robust security measures to protect data from

unauthorized access, breaches, or leaks. This includes using encryption, secure

servers, and regular security audits.

2. Compliance with Standards: Adhering to industry standards and best practices for

data security, such as ISO/IEC 27001 for information security management.

Impact on Stakeholders:

1. Transparency: Maintaining transparency about how data is used and the findings of

the project. This builds trust among stakeholders, including customers, employees,

and shareholders.

2. Stakeholder Engagement: Engaging with stakeholders to understand their concerns

and ensuring that the use of big data aligns with their expectations and values.

Accessibility:

8
1. Equal Access: Ensuring that the insights and benefits derived from the project are

accessible to all relevant stakeholders, promoting inclusivity and fairness.

Data Ethics:

1. Integrity: Ensuring that the data used is accurate and obtained through ethical

means. Avoiding manipulation or misrepresentation of data to achieve desired

outcomes.

2. Bias Mitigation: Actively identifying and mitigating any biases in data collection

and analysis processes to ensure fair and unbiased results.

Responsibility:

1. Accountability: Being accountable for the decisions made based on the data

analysis. This includes being prepared to justify and explain the methodology and

outcomes to stakeholders.

2. Harm Avoidance: Ensuring that the project does not harm individuals or groups,

whether through data misuse or unintended consequences of the decisions made.

Accuracy and Reliability:

1. Data Quality: Ensuring the data used is of high quality, accurate, and relevant. This

involves rigorous data cleaning and validation processes.

2. Methodological Rigor: Applying robust and scientifically sound methodologies to

ensure the reliability of the analysis and findings.

Transparency in Reporting:

1. Clear Documentation: Maintaining clear and thorough documentation of all

processes, methodologies, and decisions made during the project. This includes

documenting data sources, analytical methods, and any assumptions or limitations.

2. Honest Reporting: Reporting findings honestly and transparently, without omitting

or altering results to fit preconceived expectations or desired outcomes.

9
1.5 Background

Definition and Scope of Big Data

Big Data refers to extremely large datasets that are generated at high velocity and with great variety.

These datasets are so complex that traditional data processing applications are inadequate to deal

with them. Big data encompasses:

- Volume: The sheer amount of data generated. Examples include transaction records, sensor data,

and social media posts.

- Velocity: The speed at which data is generated and processed. Real-time data such as online

transactions and streaming services are key examples.

- Variety: The different types of data, including structured data (e.g., databases), semi-structured

data (e.g., XML files), and unstructured data (e.g., text, images, videos).

Scope and Limitations

This study is limited to the use of smaller datasets due to the practical challenges in accessing and

analyzing big data. The constraints include the unavailability of supercomputing resources and the

difficulties in accessing large, distributed databases. Despite these limitations, the research employs

methodologies consistent with those used in big data analytics to ensure the reliability and validity

10
of the findings.

Table 1.1 - Overview of Big Data vs. Smaller Datasets

Applications of Big Data

Big data has a wide range of applications across various sectors, each leveraging its capabilities to

enhance decision-making and operational efficiency. Some prominent sectors include:

- Healthcare: For predictive analytics, patient care optimization, and operational efficiencies.

- Finance: For fraud detection, risk management, and customer insights.

- Manufacturing: For predictive maintenance, supply chain optimization, and production planning.

- Retail: For personalized marketing, inventory management, and pricing strategies.

Focus on the Retail Industry

In the retail industry, big data plays a crucial role in transforming how businesses operate and

compete. Retailers generate massive amounts of data from various sources such as sales

transactions, customer feedback, and online interactions. This data, when effectively analyzed, can

provide valuable insights that drive strategic decisions. The following areas are particularly

impacted by big data:

1. Pricing Strategies:

11
- Dynamic Pricing: Big data allows retailers to implement dynamic pricing strategies where prices

are adjusted in real-time based on demand, competitor pricing, and other factors. For example, e-

commerce platforms like Amazon use algorithms to continuously update prices.

- Personalized Pricing: Using customer data to offer personalized discounts and promotions.

Retailers analyze purchasing behavior and preferences to tailor pricing strategies to individual

customers.

2. Inventory Control:

- Demand Forecasting: Big data analytics can predict future demand based on historical sales data,

market trends, and external factors like weather patterns or events. This helps retailers optimize

stock levels, reducing both overstock and stockouts.

- Supply Chain Management: Real-time data from suppliers and logistics can be analyzed to

improve supply chain efficiency. Retailers can track product movement, manage reorder points, and

optimize delivery schedules.

3. Lifecycle Management:

- Product Performance Analysis: By analyzing sales data, customer reviews, and social media

sentiment, retailers can identify which products are performing ill and which are not. This helps in

making informed decisions about product continuations, discontinuations, and modifications.

- New Product Development: Insights from big data can inform the development of new products

by identifying gaps in the market, customer needs, and emerging trends. Retailers can use data to

test new concepts and refine products before full-scale launch.

Importance of Big Data in Strategic Decision-Making

12
Big data provides a foundation for making data-driven decisions, which are more accurate and

reliable than decisions based on intuition or limited data. The importance of big data in strategic

decision-making includes:

- Enhanced Customer Understanding: Retailers can gain deep insights into customer behavior,

preferences, and buying patterns. This knowledge enables them to tailor their offerings and improve

customer satisfaction.

- Improved Operational Efficiency: Data-driven insights help streamline operations, reduce costs,

and increase profitability. For example, efficient inventory management reduces storage costs and

minimizes waste.

- Competitive Advantage: Companies that effectively leverage big data can gain a significant

competitive edge. They can respond more quickly to market changes, offer better customer

experiences, and optimize their overall business strategies.

Challenges of Big Data

While big data offers numerous benefits, it also presents several challenges:

- Data Quality: Ensuring the accuracy and completeness of data is critical. Poor data quality can

lead to incorrect insights and decisions.

- Data Integration: Combining data from various sources can be complex, especially when dealing

with different formats and systems.

- Privacy and Security: Protecting sensitive data from breaches and ensuring compliance with data

protection regulations is essential.

- Skilled Workforce: Analyzing big data requires specialized skills in data science, analytics, and

machine learning. Finding and retaining skilled professionals can be challenging.

13
1.6 Structure of Report

The report is organized into five chapters, each serving a specific purpose to provide a

comprehensive understanding of the project.

Chapter 1: Introduction

- Problem Description, Context, and Motivation: Explains the importance of studying big data's role

in internal company decisions.

- Objectives: Outlines the goals of the research.

- Methodology: Describes the mixed-methods approach.

- Legal, Social, Ethical, and Professional Considerations: Discusses guidelines for responsible

project conduct.

- Background: Provides an overview of big data in the retail industry.

- Structure of Report: Briefly outlines the report's contents.

Chapter 2: Literature and Technology Review

- Literature Review: Summarizes key studies on big data's impact.

- Technology Review: Describes tools and technologies for data analysis.

- Summary: Highlights gaps and opportunities for research.

Chapter 3: Implementation

- Design and Architecture: Describes the data analysis system.

- Data Collection and Preprocessing: Explains data gathering and preparation.

- Analytical Models and Algorithms: Details the models and algorithms used.

- Challenges and Solutions: Discusses issues encountered and their resolutions.

14
Chapter 4: Evaluation and Results

- Data Analysis Results: Presents the findings from the analysis.

- Comparison with Related Works: Contextualizes results with existing studies.

- Discussion: Interprets the results and their implications.

Chapter 5: Conclusion

- Summary of Findings: Recaps the main findings.

- Implications: Discusses practical implications for companies.

- Future Work: Suggests areas for further research.

- Reflection: Personal reflection on the research process.

Appendices

- Supplementary Materials: Includes project proposal, management details, datasets, and a

screencast. This structure ensures a clear and logical flow, making the report easy to follow and

understand.

15
Chapter 2 Literature – Technology Review

Provides various ways in which firms are using big data for analysis and decision making. After

defining the objectives of our research, i identified keywords such as “Big Data’, ‘Big Data and

Decision Making’ and ‘Big Data Analytics’. i searched through research papers in top journals,

conference papers and web sources and shortlisted relevant papers. Good quality research papers

have been selected through Scopus, Science Direct and Google Scholar database. The identified

keywords have been typed in the database and papers relevant to the topic have been selected.

Figure 1 shows the number of papers per year published in various journals. 2.1 What is Big Data?

Big data has been defined in several ways by several authors. Boyd and Crawford (2012) have

defined big data as cultural, technological and scholarly phenomenon while Fan et al. (2014) have

defined big data as the ocean of information. According to Kitchin (2014), big data is defined as

huge volume of structured and unstructured data. Waller & Fawcett (2013) define big data as

datasets that are too large for traditional data processing systems and therefore require new

technologies to process them. Dubey et al. (2015) describe it as the traditional enterprise machine

generated data and social data. Big data is a term that describes the large volume of data – both

structured and unstructured – that inundates a business on a day-to-day basis. But it’s not the

amount of data that’s important. It is what organizations do with the data that matters. Big data can

be analyzed for insights that lead to better decisions and strategic business moves. According to

Dyche (2014), the concept of big data for many people is just millions of data which can be

analyzed through technologies. Big data in true sense is the proper use of data through technologies

in any particular aspect. Big data evolved in the first decade of the 21st century embraced first by

the online and startup firms. A new type of data voice, text, log files, images and videos have come

into existence (Davenport and Dyche, 2013). The proper use of big data results in several

applications of big data helping in decision making

16
2.1 Literature Review

What is Big Data?

Big data has been defined in several ways by several authors. Boyd and Crawford (2012) have

defined big data as cultural, technological and scholarly phenomenon while Fan et al. (2014) have

defined big data as the ocean of information. According to Kitchin (2014), big data is defined as

huge volume of structured and unstructured data. Waller & Fawcett (2013) define big data as

datasetsthat are too large for traditional data processing systems and therefore require new