0% found this document useful (0 votes)

8 views32 pages

Unit 1 Bda

Uploaded by

heavensfate111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

8 views32 pages

Unit 1 Bda

Uploaded by

heavensfate111

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 32

CCS334

BIG DATA ANALYTICS

1|Page
UNIT I
UNDERSTANDING BIG DATA

Introduction to big data – convergence of key trends – unstructured data – industry examples of
big data – web analytics – big data applications– big data technologies – introduction to Hadoop –
open source technologies – cloud and big data – mobile business intelligence – Crowd sourcing
analytics – inter and trans firewall analytics.

INTRODUCTION TO BIG DATA

What is Big Data

Big data refers to extremely large and diverse collections of structured, unstructured, and semi-
structured data that continues to grow exponentially over time. These datasets are so huge and
complex in volume, velocity, and variety, that traditional data management systems cannot store,
process, and analyze them.
The amount and availability of data is growing rapidly, spurred on by digital technology
advancements, such as connectivity, mobility, the Internet of Things (IoT), and artificial
intelligence (AI). As data continues to expand and proliferate, new big data tools are emerging to
help companies collect, process, and analyze data at the speed needed to gain the most value from
it.
Big data describes large and diverse datasets that are huge in volume and also rapidly grow in size
over time. Big data is used in machine learning, predictive modeling, and other advanced
analytics to solve business problems and make informed decisions
The Vs of big data

Big data definitions may vary slightly, but it will always be described in terms of volume, velocity,
and variety. These big data characteristics are often referred to as the “3 Vs of

 Volume
As its name suggests, the most common characteristic associated with big data is its high
volume. This describes the enormous amount of data that is available for collection and
produced from a variety of sources and devices on a continuous basis.

 Velocity
Big data velocity refers to the speed at which data is generated. Today, data is often
produced in real time or near real time, and therefore, it must also be

2|Page
processed, accessed, and analyzed at the same rate to have any meaningful impact.

 Variety
Data is heterogeneous, meaning it can come from many different sources and can be
structured, unstructured, or semi-structured. More traditional structured data (such as data in
spreadsheets or relational databases) is now supplemented by unstructured text, images,
audio, video files, or semi- structured formats like sensor data that can’t be organized in a
fixed data schema. big data” and were first defined by Gartner in 2001.

In addition to these three original Vs, three others that are often mentioned in relation to harnessing the
power of big data: veracity, variability, and value.

 Veracity:
Big data can be messy, noisy, and error-prone, which makes it difficult to control the quality
and accuracy of the data. Large datasets can be unwieldy and confusing, while smaller
datasets could present an incomplete picture. The higher the veracity of the data, the more
trustworthy it is.

 Variability:
The meaning of collected data is constantly changing, which can lead to inconsistency over
time. These shifts include not only changes in context and interpretation but also data
collection methods based on the information that companies want to capture and analyze.

 Value:
It’s essential to determine the business value of the data you collect. Big data must contain
the right data and then be effectively analyzed in order to yield insights that can help drive
decision-making.

Sources of Big Data

These data come from many sources like

o Social networking sites: Facebook, Google, LinkedIn all these sites generate huge amount of
data on a day to day basis as they have billions of users worldwide.
o E-commerce site: Sites like Amazon, Flipkart, Alibaba generates huge number of logs from
which users buying trends can be traced.
o Weather Station: All the weather station and satellite gives very huge data which are stored
and manipulated to forecast weather.
o Telecom company: Telecom giants like Airtel, Vodafone study the user trends and
accordingly publish their plans and for this they store the data of its million users.
o Share Market: Stock exchange across the world generates huge amount of data through its
daily transaction.

How does big data work?

The central concept of big data is that the more visibility you have into anything, the more
effectively you can gain insights to make better decisions, uncover growth opportunities, and improve
your business model.

3|Page
Making big data work requires three main actions:
1. Integration:
Big data collects terabytes, and sometimes even petabytes, of raw data from many sources that
must be received, processed, and transformed into the format that business users and analysts
need to start analyzing it.

2. Management:
Big data needs big storage, whether in the cloud, on-premises, or both. Data must also be stored in
whatever form required. It also needs to be processed and made available in real time.
Increasingly, companies are turning to cloud solutions to take advantage of the unlimited compute
and scalability.

3. Analysis:
The final step is analyzing and acting on big data—otherwise, the investment won’t be worth it.
Beyond exploring the data itself, it’s also critical to communicate and share insights across the
business in a way that everyone can understand. This includes using tools to create data
visualizations like charts, graphs, and dashboards.

What is big data analytics?

Big data analytics is the process of collecting, examining, and analysing large amounts of data to
discover market trends, insights, and patterns that can help companies make better business decisions.
This information is available quickly and efficiently so that companies can be agile in crafting plans to
maintain their competitive advantage.
Big data analytics is important because it helps companies leverage their data to identify opportunities
for improvement and optimisation. Across different business segments, increasing efficiency leads to
overall more intelligent operations, higher profits, and satisfied customers. Big data analytics helps
companies reduce costs and develop better, customer-centric products and services.
Technologies such as business intelligence (BI) tools and systems help organisations take
unstructured and structured data from multiple sources. Users (typically employees) input queries into
these tools to understand business operations and performance. Big data analytics uses the four data
analysis methods to uncover meaningful insights and derive solutions.

Types of big data analytics

Four main types of big data analytics support and inform different business decisions.

1. Descriptive analytics
Descriptive analytics refers to data that can be easily read and interpreted. This data helps create
reports and visualise information that can detail company profits and sales.
Example: During the pandemic, a leading pharmaceutical company conducted data analysis on its
offices and research labs. Descriptive analytics helped them identify consolidated unutilised spaces
and departments, saving the company millions of pounds.

4|Page
2. Diagnostics analytics
Diagnostics analytics helps companies understand why a problem occurred. Big data technologies and
tools allow users to mine and recover data that helps dissect an issue and prevent it from happening in
the future.
Example: An online retailer’s sales have decreased even though customers continue to add items to
their shopping carts. Diagnostics analytics helped to understand that the payment page was not
working correctly for a few weeks.

3. Predictive analytics
Predictive analytics looks at past and present data to make predictions. With artificial intelligence
(AI), machine learning, and data mining, users can analyse the data to predict market trends.
Example: In the manufacturing sector, companies can use algorithms based on historical data to
predict if or when a piece of equipment will malfunction or break down.

4. Prescriptive analytics
Prescriptive analytics solves a problem, relying on AI and machine learning to gather and use data for
risk management.
Example: Within the energy sector, utility companies, gas producers, and pipeline owners identify
factors that affect the price of oil and gas to hedge risks.

Benefits of big data analytics

Incorporating big data analytics into a business or organisation has several advantages. These include:
Cost reduction: Big data can reduce costs in storing all business data in one place. Tracking
analytics also helps companies find ways to work more efficiently to cut costs wherever
possible.
Product development: Developing and marketing new products, services, or brands is much
easier when based on data collected from customers’ needs and wants. Big data analytics also
helps businesses understand product viability and to keep up with trends.
Strategic business decisions: The ability to constantly analyse data helps businesses make
better and faster decisions, such as cost and supply chain optimisation.
Customer experience: Data-driven algorithms help marketing efforts (targeted ads, for
example) and increase customer satisfaction by delivering an enhanced customer experience.
Risk management: Businesses can identify risks by analysing data patterns and developing
solutions for managing those risks.
UNSTRUCTURED DATA

Types of Big Data

All data cannot be stored in the same way. The methods for data storage can be accurately
evaluated after the type of data has been identified

1. Structured data
Structured data is data whose elements are addressable for effective analysis. It has been
organized into a formatted repository that is typically a database. It concerns

5|Page
all data which can be stored in database in a table with rows and columns. They have relational keys
and can easily be mapped into pre-designed fields. Today, those data are most processed in the
development and simplest way to manage information. Example: Relational data.

2. Semi-Structured data
Semi-structured data is information that does not reside in a relational database but that has
some organizational properties that make it easier to analyze. With some processes, you can store
them in the relation database (it could be very hard for some kind of semi-structured data), but Semi-
structured exist to ease space. Example: XML data.

3. Unstructured data
Unstructured data is a data which is not organized in a predefined manner or does not have a
predefined data model, thus it is not a good fit for a mainstream
relational database. So for Unstructured data, there are alternative platforms for storing and managing,
it is increasingly prevalent in IT systems and is used by organizations in a variety of business
intelligence and analytics applications. Example: Word, PDF, Text, Media logs.
Unstructured data is the data which does not conforms to a data model and has no easily
identifiable structure such that it can not be used by a computer program easily. Unstructured data is
not organised in a pre-defined manner or does not have a pre-defined data model, thus it is not a good
fit for a mainstream relational database.
From 80% to 90% of data generated and collected by organizations is unstructured, and its
volumes are growing rapidly — many times faster than the rate of growth for structured databases.
However, unstructured data has historically been very difficult to analyze. With the help of AI
and machine learning, new software tools are emerging that can search through vast quantities of it to
uncover beneficial and actionable business intelligence.

Unstructured data vs. structured data

Aspect Structured Data Unstructured Data

Storage Stored in relational databases (RDBMS) Cannot be stored in an RDBMS

Other Name Relational data —

Fits into designated fields (e.g., zip codes, phone No predefined data model or
Organization
numbers, credit cards) consistent structure

Easy to search using human-defined queries or Challenging to search; requires

Searchability
software specialized tools

6|Page
Aspect Structured Data Unstructured Data

Processing & Difficult for conventional software to

Easily processed by conventional software
Analysis ingest, process, and analyze

Comes in many formats (text, images,

Format Uniform and predictable
video, audio, etc.)

Customer interactions, rich media,

Examples Customer IDs, transaction records, inventory lists
social network conversations

Limited support; robust tools are only

Tool Support Well-supported by mature data mining systems
now emerging

What are some examples of unstructured data?

Unstructured data can be created by people or generated by machines. Here are some
examples of the human-generated variety:

 Email: Email message fields are unstructured and cannot be parsed by traditional analytics
tools. That said, email metadata affords it some structure, and explains why email is
sometimes considered semi-structured data.
 Text files: This category includes word processing documents, spreadsheets, presentations,
email, and log files.
 Social media and websites: data from social networks like Twitter, LinkedIn, and Facebook,
and websites such as Instagram, photo-sharing sites, and YouTube.
 Mobile and communications data: For this category, look no further than text messages,
phone recordings, collaboration software, chat, and instant messaging.
 Media: This data includes digital photos, audio, and video files. Here are
some examples of unstructured data generated by machines:
 Scientific data: This includes oil and gas surveys, space exploration, seismic imagery, and
atmospheric data.
 Digital surveillance: This category features data like reconnaissance photos and videos.
 Satellite imagery: This data includes weather data, land forms, and military movements.
le business intelligence.

Characteristics of Unstructured Data:

 Data neither conforms to a data model nor has any structure.
 Data cannot be stored in the form of rows and columns as in Databases
 Data does not follow any semantic or rules
 Data lacks any particular format or sequence
 Data has no easily identifiable structure
 Due to lack of identifiable structure, it cannot used by computer programs easily

Sources of Unstructured Data:

 Web pages
7|Page
 Images (JPEG, GIF, PNG, etc.)
 Videos
 Memos
 Reports
 Word documents and PowerPoint presentations
 Surveys

Advantages of Unstructured Data:

 Its supports the data which lacks a proper format or sequence
 The data is not constrained by a fixed schema
 Very Flexible due to absence of schema.
 Data is portable
 It is very scalable
 It can deal easily with the heterogeneity of sources.
 These types of data have a variety of business intelligence and analytics applications.

Disadvantages of Unstructured data:

 It is difficult to store and manage unstructured data due to lack of schema and structure
 Indexing the data is difficult and error prone due to unclear structure and not having pre-
defined attributes. Due to which search results are not very accurate.
 Ensuring security to data is difficult task.

Problems faced in storing unstructured data:

 It requires a lot of storage space to store unstructured data.
 It is difficult to store videos, images, audios, etc.
 Due to unclear structure, operations like update, delete and search is very difficult.
 Storage cost is high as compared to structured data
 Indexing the unstructured data is difficult

Possible solution for storing Unstructured data:

 Unstructured data can be converted to easily manageable formats
 using Content addressable storage system (CAS) to store unstructured data. It stores data
based on their metadata and a unique name is assigned to every object stored in it. The object
is retrieved based on content not its location.
 Unstructured data can be stored in XML format.
 Unstructured data can be stored in RDBMS which supports BLOBs

Extracting information from unstructured Data:

unstructured data do not have any structure. So it cannot easily interpreted by conventional
algorithms. It is also difficult to tag and index unstructured data. So extracting information from them
is tough job. Here are possible solutions:
 Taxonomies or classification of data helps in organising data in hierarchical structure.
Which will make search process easy.
 Data can be stored in virtual repository and be automatically tagged. For example
Documentum.
 Use of application platforms like XOLAP.
XOLAP helps in extracting information from e-mails and XML based documents
 Use of various data mining tools

8|Page
BIG DATA INDUSTRY APPLICATIONS

Here are some of the sectors where Big Data is actively used:

Ecommerce - Predicting customer trends and optimizing prices are a few of the ways e-
commerce uses Big Data analytics
Marketing - Big Data analytics helps to drive high ROI marketing operations, which result in
improved sales
Education - Used to develop new and improve existing courses based on market requirements
Healthcare - With the help of a patient’s medical history, Big Data analytics is
used to predict how likely they are to have health issues

Media and entertainment - Used to understand the demand of shows, movies, songs, and more
to deliver a personalized recommendation list to its users
Banking - Customer income and spending patterns help to predict the likelihood of choosing
various banking offers, like loans and credit cards
Telecommunications - Used to forecast network capacity and improve customer experience
Government - Big Data analytics helps governments in law enforcement, among other things

APPLICATIONS OF BIG DATA

In today’s world, there are a lot of data. Big companies utilize those data for their business growth. By
analyzing this data, the useful decision can be made in various cases as discussed below:

1. Tracking Customer Spending Habit, Shopping Behavior:

In big retails store (like Amazon, Walmart, Big Bazar etc.) management team has to keep
data of customer’s spending habit (in which product customer spent, in which brand they wish to
spent, how frequently they spent), shopping behavior, customer’s most liked product (so that they can
keep those products in the store). Which product is being searched/sold most, based on that data,
production/collection rate of that product get fixed.
Banking sector uses their customer’s spending behavior-related data so that they can provide
the offer to a particular customer to buy his particular liked product by using bank’s credit or debit
card with discount or cashback. By this way, they can send the right offer to the right person at the
right time.

2. Recommendation:
By tracking customer spending habit, shopping behavior, Big retails store provide a
recommendation to the customer. E-commerce site like Amazon, Walmart, Flipkart does product
recommendation. They track what product a customer is searching, based on that data they
recommend that type of product to that customer.
As an example, suppose any customer searched bed cover on Amazon. So, Amazon got data
that customer may be interested to buy bed cover. Next time when that customer will go to any google

9|Page
page, advertisement of various bed covers will be seen. Thus, advertisement of the right product to the
right customer can be sent.
YouTube also shows recommend video based on user’s previous liked, watched video type.
Based on the content of a video, the user is watching, relevant advertisement is shown during video
running. As an example suppose someone watching a tutorial video of Big data, then advertisement of
some other big data course will be shown during that video.

3. Smart Traffic System:

Data about the condition of the traffic of different road, collected through camera kept beside
the road, at entry and exit point of the city, GPS device placed in the vehicle (Ola, Uber cab, etc.). All
such data are analyzed and jam-free or less jam way, less time taking ways are recommended. Such a
way smart traffic system can be built in the city by Big data analysis. One more profit is fuel
consumption can be reduced.

4. Secure Air Traffic System:

At various places of flight (like propeller etc) sensors present. These sensors capture data like
the speed of flight, moisture, temperature, other environmental condition. Based on such data analysis,
an environmental parameter within flight are set up and varied.
By analyzing flight’s machine-generated data, it can be estimated how long the machine can
operate flawlessly when it to be replaced/repaired.

5. Auto Driving Car:

Big data analysis helps drive a car without human interpretation. In the various spot of car
camera, a sensor placed, that gather data like the size of the surrounding car, obstacle, distance from
those, etc. These data are being analyzed, then various calculation like how many angles to rotate,
what should be speed, when to stop, etc carried out. These calculations help to take action
automatically.

6. Virtual Personal Assistant Tool:

Big data analysis helps virtual personal assistant tool (like Siri in Apple Device, Cortana in
Windows, Google Assistant in Android) to provide the answer of the various question asked by users.
This tool tracks the location of the user, their local time, season, other data related to question asked,
etc. Analyzing all such data, it provides an answer.
As an example, suppose one user asks “Do I need to take Umbrella?”, the tool collects data
like location of the user, season and weather condition at that location, then analyze these data to
conclude if there is a chance of raining, then provide the answer.

7. IoT:
Manufacturing company install IOT sensor into machines to collect operational
data. Analyzing such data, it can be predicted how long machine will work without any problem when
it requires repairing so that company can take action before the situation when machine facing a lot of
issues or gets totally down. Thus, the cost to replace the whole machine can be saved.
In the Healthcare field, Big data is providing a significant contribution. Using big data tool,
data regarding patient experience is collected and is used by doctors to give better treatment. IoT
device can sense a symptom of probable coming disease in the human body and prevent it from giving
advance treatment. IoT Sensor placed near-patient, new-born baby constantly keeps track of various
health condition like heart bit rate, blood presser, etc. Whenever any parameter crosses the safe limit,
an alarm sent to a doctor, so that they can take step remotely very soon.

10 | P a g e
8. Education Sector:
Online educational course conducting organization utilize big data to search candidate,
interested in that course. If someone searches for YouTube tutorial video on a subject, then online or
offline course provider organization on that subject send ad online to that person about their course.

9. Energy Sector:
Smart electric meter read consumed power every 15 minutes and sends this read data to the
server, where data analyzed and it can be estimated what is the time in a day when the power load is
less throughout the city. By this system manufacturing unit or housekeeper are suggested the time
when they should drive their heavy machine in the night time when power load less to enjoy less
electricity bill.

10. Media and Entertainment Sector:

Media and entertainment service providing company like Netflix, Amazon Prime, Spotify do
analysis on data collected from their users. Data like what type of video, music users are watching,
listening most, how long users are spending on site, etc are collected and analyzed to set the next
business strategy.

BIG DATA TECHNOLOGIES

Big data technologies can be categorized into four main types: data storage, data mining, data
analytics, and data visualization [2]. Each of these is associated with certain tools, and you’ll want to
choose the right tool for your business needs depending on the type of big data technology required.

1. Data storage
Big data technology that deals with data storage has the capability to fetch, store, and manage big data.
It is made up of infrastructure that allows users to store the data so that it is convenient to access.
Most data storage platforms are compatible with other programs. Two commonly used tools are
Apache Hadoop and MongoDB.
 Apache Hadoop: Apache is the most widely used big data tool. It is an open- source software
platform that stores and processes big data in a distributed computing environment across
hardware clusters. This distribution allows for faster data processing. The framework is
designed to reduce bugs or faults, be scalable, and process all data formats.
 MongoDB: MongoDB is a NoSQL database that can be used to store large volumes of data.
Using key-value pairs (a basic unit of data), MongoDB categorizes documents into
collections. It is written in C, C++, and JavaScript, and is one of the most popular big data
databases because it can manage and store unstructured data with ease.

2. Data mining
Data mining extracts the useful patterns and trends from the raw data. Big data technologies such as
Rapidminer and Presto can turn unstructured and structured data into usable information.

11 | P a g e
 Rapidminer: Rapidminer is a data mining tool that can be used to build predictive models. It
draws on these two roles as strengths, of processing and preparing data, and building machine
and deep learning models. The end-to- end model allows for both functions to drive impact
across the organization [3].
 Presto: Presto is an open-source query engine that was originally developed by Facebook to
run analytic queries against their large datasets. Now, it is available widely. One query on
Presto can combine data from multiple sources within an organization and perform analytics
on them in a matter of minutes.

3. Data analytics
In big data analytics, technologies are used to clean and transform data into information that can be
used to drive business decisions. This next step (after data
mining) is where users perform algorithms, models, and predictive analytics using tools such as Apache
Spark and Splunk.
 Apache Spark: Spark is a popular big data tool for data analysis because it is fast and
efficient at running applications. It is faster than Hadoop because it uses random access
memory (RAM) instead of being stored and processed in batches via MapReduce . Spark
supports a wide variety of data analytics tasks and queries.
 Splunk: Splunk is another popular big data analytics tool for deriving insights from large
datasets. It has the ability to generate graphs, charts, reports, and dashboards. Splunk also
enables users to incorporate artificial intelligence (AI) into data outcomes.

4. Data visualization
Finally, big data technologies can be used to create stunning visualizations from the data. In data-
oriented roles, data visualization is a skill that is beneficial for presenting recommendations to
stakeholders for business profitability and operations—to tell an impactful story with a simple graph.
 Tableau: Tableau is a very popular tool in data visualization because its drag- and-drop
interface makes it easy to create pie charts, bar charts, box plots, Gantt charts, and
more. It is a secure platform that allows users to share visualizations and dashboards in real
time.
 Looker: Looker is a business intelligence (BI) tool used to make sense of big data analytics
and then share those insights with other teams. Charts, graphs, and dashboards can be
configured with a query, such as monitoring weekly brand engagement through social media
analytics.
OPEN SOURCE TECHNOLOGIES / BIG DATA ANALYTICS TOOLS

There are hundreds of data analytics tools out there in the market today but the selection of the right
tool will depend upon your business NEED, GOALS, and VARIETY to get business in the right
direction. Now, let’s check out the top 10 analytics tools in big data.

1. APACHE Hadoop
It’s a Java-based open-source platform that is being used to store and process big data. It is
built on a cluster system that allows the system to process data efficiently and let the data run parallel.
It can process both structured and unstructured data from one server to multiple computers. Hadoop
also offers cross-platform support for its users. Today, it is the best big data analytic tool and is
popularly used by many tech giants such as Amazon, Microsoft, IBM, etc.Features of Apache
Hadoop:

 Free to use and offers an efficient storage solution for businesses.

 Offers quick access via HDFS (Hadoop Distributed File System).

12 | P a g e
 Highly flexible and can be easily implemented with MySQL, and JSON.
 Highly scalable as it can distribute a large amount of data in small segments.
 It works on small commodity hardware like JBOD or a bunch of disks.

2. Cassandra
APACHE Cassandra is an open-source NoSQL distributed database that is used to fetch large
amounts of data. It’s one of the most popular tools for data analytics and has been praised by many
tech companies due to its high scalability and availability without compromising speed and
performance. It is capable of delivering thousands of operations every second and can handle
petabytes of resources with almost zero downtime. It was created by Facebook back in 2008 and was
published publicly.

Features of APACHE Cassandra:

 Data Storage Flexibility: It supports all forms of data i.e. structured, unstructured,
semi-structured, and allows users to change as per their needs.
 Data Distribution System: Easy to distribute data with the help of replicating data on
multiple data centers.
 Fast Processing: Cassandra has been designed to run on efficient commodity hardware and
also offers fast storage and data processing.
 Fault-tolerance: The moment, if any node fails, it will be replaced without any delay.

3. Qubole
It’s an open-source big data tool that helps in fetching data in a value of chain using ad-hoc
analysis in machine learning. Qubole is a data lake platform that offers end-to-end service with
reduced time and effort which are required in moving data pipelines. It is capable of configuring
multi-cloud services such as AWS, Azure, and Google Cloud. Besides, it also helps in lowering the
cost of cloud computing by 50%.

Features of Qubole:
 Supports ETL process: It allows companies to migrate data from multiple sources in one
place.
 Real-time Insight: It monitors user’s systems and allows them to view real-time insights
 Predictive Analysis: Qubole offers predictive analysis so that companies can take actions
accordingly for targeting more acquisitions.
 Advanced Security System: To protect users’ data in the cloud,

4.Qubole uses an advanced security system and also ensures to protect any future breaches.
Xplenty

It is a data analytic tool for building a data pipeline by using minimal codes in it. It offers a
wide range of solutions for sales, marketing, and support. With the help of its interactive graphical
interface, it provides solutions for ETL, ELT, etc. The best part of using Xplenty is its low investment
in hardware & software and its offers support via email, chat, telephonic and virtual meetings.
Xplenty is a platform to process data for analytics over the cloud and segregates all the data together.

Features of Xplenty:
 Rest API: A user can possibly do anything by implementing Rest API
 Flexibility: Data can be sent, and pulled to databases, warehouses, and salesforce.

13 | P a g e
 Data Security: It offers SSL/TSL encryption and the platform is capable of verifying
algorithms and certificates regularly.
 Deployment: It offers integration apps for both cloud & in-house and supports deployment to
integrate apps over the cloud.

4. Spark
APACHE Spark is another framework that is used to process data and perform numerous
tasks on a large scale. It is also used to process data via multiple computers with the help of
distributing tools. It is widely used among data analysts as it offers easy-to-use APIs that provide easy
data pulling methods and it is capable of handling multi-petabytes of data as well. Recently, Spark
made a record of processing 100 terabytes of data in just 23 minutes which broke the previous world
record of Hadoop (71 minutes). This is the reason why big tech giants are moving towards spark now
and is highly suitable for ML and AI today.

Features of APACHE Spark:

 Ease of use: It allows users to run in their preferred language. (JAVA, Python, etc.)
 Real-time Processing: Spark can handle real-time streaming via Spark Streaming
 Flexible: It can run on, Mesos, Kubernetes, or the cloud.

5. Mongo DB
Came in limelight in 2010, is a free, open-source platform and a document- oriented (NoSQL)
database that is used to store a high volume of data. It uses collections and documents for storage and
its document consists of key-value pairs which are considered a basic unit of Mongo DB. It is so
popular among developers due to its availability for multi-programming languages such as Python,
Jscript, and Ruby.Features of Mongo DB:

 Written in C++: It’s a schema-less DB and can hold varieties of documents inside.
 Simplifies Stack: With the help of mongo, a user can easily store files without any
disturbance in the stack.
 Master-Slave Replication: It can write/read data from the master and can be called back for
backup.

6. Apache Storm
A storm is a robust, user-friendly tool used for data analytics, especially in small companies.
The best part about the storm is that it has no language barrier (programming) in it and can support
any of them. It was designed to handle a pool of large data in fault-tolerance and horizontally scalable
methods. When we talk about real-time data processing, Storm leads the chart because of its
distributed real-time big data processing system, due to which today many tech giants are using
APACHE Storm in their system. Some of the most notable names are Twitter, Zendesk, NaviSite, etc.

Features of Storm:
 Data Processing: Storm process the data even if the node gets disconnected
 Highly Scalable: It keeps the momentum of performance even if the load increases
 Fast: The speed of APACHE Storm is impeccable and can process up to 1 million messages of
100 bytes on a single node.

7. SAS
Today it is one of the best tools for creating statistical modeling used by data
analysts. By using SAS, a data scientist can mine, manage, extract or update data in different variants
from different sources. Statistical Analytical System or SAS allows a user to access the data in any
14 | P a g e
format (SAS tables or Excel worksheets). Besides that it also offers a cloud platform for business
analytics called SAS Viya and also to get a strong grip on AI & ML, they have introduced new tools
and products.

Features of SAS:
 Flexible Programming Language: It offers easy-to-learn syntax and has also vast libraries
which make it suitable for non-programmers
 Vast Data Format: It provides support for many programming languages which also include
SQL and carries the ability to read data from any format.
 Encryption: It provides end-to-end security with a feature called SAS/SECURE.

8. Data Pine
Datapine is an analytical used for BI and was founded back in 2012 (Berlin, Germany). In
a short period of time, it has gained much popularity in a number of countries and it’s mainly used
for data extraction (for small-medium companies fetching data for close monitoring). With the help of
its enhanced UI design, anyone can visit and check the data as per their requirement and offer in 4
different price brackets, starting from $249 per month. They do offer dashboards by functions, industry,
and platform.

Features of Datapine:
 Automation: To cut down the manual chase, datapine offers a wide array of AI assistant and
BI tools.
 Predictive Tool: datapine provides forecasting/predictive analytics by using historical and
current data, it derives the future outcome.
 Add on: It also offers intuitive widgets, visual analytics & discovery, ad hoc reporting,
etc.

9. Rapid Miner
It’s a fully automated visual workflow design tool used for data analytics. It’s a no-code
platform and users aren’t required to code for segregating data. Today, it is being heavily used in
many industries such as ed-tech, training, research, etc. Though it’s an open-source platform but has a
limitation of adding 10000 data rows and a single logical processor. With the help of Rapid Miner,
one can easily deploy their ML models to the web or mobile (only when the user interface is ready to
collect real-time figures).

Features of Rapid Miner:

 Accessibility: It allows users to access 40+ types of files (SAS, ARFF, etc.) via URL
 Storage: Users can access cloud storage facilities such as AWS and dropbox
 Data validation: Rapid miner enables the visual display of multiple results in history for
better evaluation.
CLOUD AND BIG DATA

1. Big Data:
Big data refers to the data which is huge in size and also increasing rapidly with respect to
time. Big data includes structured data, unstructured data as well as semi-structured data. Big
data cannot be stored and processed in traditional data management tools it needs specialized
big data management tools. It refers to complex and large data sets having 5 V’s volume,
velocity, Veracity, Value and variety information assets. It includes data storage, data analysis,
data mining and data visualization.
Examples of the sources where big data is generated includes social media data, e- commerce
data, weather station data, IoT Sensor data etc.
15 | P a g e
Characteristics of Big Data :
 Variety of Big data – Structured, unstructured, and semi structured data
 Velocity of Big data – Speed of data generation
 Volume of Big data – Huge volumes of data that is being generated
 Value of Big data – Extracting useful information and making it valuable
 ariability of Big data – Inconsistency which can be shown by the data at times.

Advantages of Big Data :

 Cost Savings
 Better decision-making
 Better Sales insights
 Increased Productivity
 Improved customer service.

Disadvantages of Big Data :

 Incompatible tools
 Security and Privacy Concerns
 Need for cultural change
 Rapid change in technology
 Specific hardware needs.

2. Cloud Computing :
Cloud computing refers to the on demand availability of computing resources over internet. These
resources includes servers, storage, databases, software, analytics, networking and intelligence over
the Internet and all these resources can be used as per requirement of the customer. In cloud
computing customers have to pay as per use. It is very flexible and can be resources can be scaled
easily depending upon the requirement. Instead of buying any IT resources physically, all resources
can be availed depending on the requirement from the cloud vendors. Cloud computing has three
service models i.e Infrastructure as a Service (IaaS), Platform as a Service (PaaS) and Software as a
Service (SaaS).
Examples of cloud computing vendors who provides cloud computing services are Amazon Web
Service (AWS), Microsoft Azure, Google Cloud Platform, IBM Cloud Services etc.
Characteristics of Cloud Computing :
 On-Demand availability
 Accessible through a network
 Elastic Scalability
 Pay as you go model
 Multi-tenancy and resource pooling.

Advantages of Cloud Computing :

 Back-up and restore data
 Improved collaboration
 Excellent accessibility
 Low maintenance cost
 On-Demand Self-service.

isadvantages of Cloud Computing:

 Vendor lock-in
 Limited Control
 Security Concern
 Downtime due to various reason
16 | P a g e
 Requires good Internet connectivity.

Difference between Big Data and Cloud Computing:

S.No. BIG DATA CLOUD COMPUTING

Big data refers to the data which is huge in Cloud computing refers to the on
size and also increasing rapidly with respect demand availability of computing
01.
to time. resources over internet.

Cloud Computing Services includes

Big data includes structured data, Infrastructure as a Service (IaaS), Platform as a
02. unstructured data as well as semi- Service (PaaS) and Software as a Service
structured data. (SaaS).

Volume of data, Velocity of data, Variety of On-Demand availability of IT resources, broad

03. data, Veracity of data, and network access, resource pooling,

Value of data are considered as the 5 most elasticity and measured service are considered
important characteristics of Big data. as the main characteristics of cloud
computing.

The purpose of big data is to organizing the

large volume of data and extracting the useful The purpose of cloud computing is to store and
information from it and using that process data in cloud or availing remote IT
04.
information for the improvement of business. services without physically installing any IT
resources.

Distributed computing is used for analyzing

the data and extracting the useful Internet is used to get the cloud based services
05.
information. from different cloud vendors.

Big data management allows centralized

platform, provision for backup and
Cloud computing services are cost effective,
06. recovery and low maintenance cost.
scalable and robust.

17 | P a g e
Some of the challenges of big data are variety Some of the challenges of cloud computing are
of data, data storage and integration, data availability, transformation, security concern,
processing and resource management. charging model.
07.

Big data refers to huge volume of data, its Cloud computing refers to remote IT resources
management, and useful information and different internet service models.
08.
extraction.

Cloud computing is used to store data and

information on remote servers and also
Big data is used to describe huge
09. processing the data using remote infrastructure.
volume of data and information.

Some of the cloud computing vendors who

Some of the sources where big data is provides cloud computing services are Amazon
generated includes social media data, e- Web Service (AWS), Microsoft Azure, Google
10.
commerce data, weather station data, IoT Cloud Platform, IBM Cloud Services etc.
Sensor data etc.

18 | P a g e
WEB ANALYTICS

Web Analytics or Online Analytics refers to the analysis of quantifiable and measurable data of your
website with the aim of understanding and optimizing the web usage.
Web Analytics is the methodological study of online/offline patterns and trends. It is a technique that
you can employ to collect, measure, report, and analyze your website data. It is normally carried out to
analyze the performance of a website and optimize its web usage.
web analytics used to track key metrics and analyze visitors’ activity and traffic flow. It is a tactical
approach to collect data and generate reports. It is an ongoing process that helps in attracting more
traffic to a site and thereby, increasing the Return on Investment.

Web analytics focuses on various issues. For example,

 Detailed comparison of visitor data, and Affiliate or referral data.
 Website navigation patterns.
 The amount of traffic your website received over a specified period of time.
 Search engine data.

Web analytics improves online experience for your customers and elevates your business prospects.
There are various Web Analytics tools available in the market. For example, Google Analytics,
Kissmetrics, Optimizely, etc.

Importance of Web Analytics

Web Analytics needed to assess the success rate of a website and its associated
business. Using Web Analytics, we can −

19 | P a g e
 Assess web content problems so that they can be rectified
 Have a clear perspective of website trends
 Monitor web traffic and user flow
 Demonstrate goals acquisition
 Figure out potential keywords
 Identify segments for improvement
 Find out referring sources

Web Analytics Process

The primary objective of carrying out Web Analytics is to optimize the website in order to provide
better user experience. It provides a data-driven report to measure visitors’ flow throughout the
website.
Take a look at the following illustration. It depicts the process of web analytics.
 Set the business goals.
 To track the goal achievement, set the Key Performance Indicators (KPI).
 Collect correct and suitable data.
 To extract insights, Analyze data.
 Based on assumptions learned from the data analysis, Test alternatives.
 Based on either data analysis or website testing, Implement insights. Types of
Web Analytics

There are two types of web analytics −

 On-site − It measures the users’ behaviour once it is on the website. For

example, measurement of your website performance.
 Off-site − It is the measurement and analysis irrespective of whether you own or maintain a
website. For example, measurement of visibility, comments, potential audience, etc.

Metrics of Web Analytics

There are three basic metrics of web analytics −

Count
It is most basic metric of measurement. It is represented as a whole number or a fraction. For
example,
 Number of visitors = 12999, Number of likes = 3060, etc.
 Total sales of merchandise = $54,396.18.

Ratio
It is typically a count divided by some other count. For example, Page views per visit.

Key Performance Indicator (KPI)

It depends upon the business type and strategy. KPI varies from one business to another.

Micro and macro Level Data Insights

Google Analytics gives you more insight data accurately. You can understand the data at two
levels micro level and macro level.

20 | P a g e
Micro Level Analysis
It pertains to an individual or a small group of individuals. For example, number of times job
application submitted, number of times print this page was clicked, etc.

Macro Level Analysis It is concerned with the primary business objectives with huge groups
of people such as communities, nation, etc. For example, number of conversions in a particular
demographic.

Web Analysis - What to Measure?

These are the few measurements conducted in web analytics −

 Engagement Rate
It shows how long a person stays on your web page. What all pages he surf. To make your
web pages more engaging, include informative content, visuals, fonts and bullets.

 Bounce Rate
If a person leaves your website within a span of 30 sec, it is considered as a bounce. The rate
at which users spin back is called the bounce rate. To minimize bounce rate include related
posts, clear call-to-action and backlinks in your webpages.

 Dashboards
Dashboard is single page view of information important to user. You can create your own
dashboards keeping in mind your requirements. You may keep only frequently viewed data
on dashboard.

 Event Tracking
Event tracking allows you to track other activities on your website. For example, you can
track downloads and sign-ups through event tracking.

 Traffic Source
You can overview traffic sources. You can even filter it further. Figuring out the key areas
can help you learn about the area of improvement.

 Annotations
It allows you to view a traffic report for past time. You can click on graph and type in to
save it for future study.

 Visitor Flow
It gives you a clear picture of pages visited and the sequence of the same. Understanding
users’ path may help you in re-navigation in order to give customer a hassle-free navigation.

 Content
It gives you insight about website’s content section. You can see how each
page is doing, website loading speed, etc.

 Conversions
Analytics lets you track goals and path used to achieve these goals. You can get details
regarding, product performances, purchase amount, and mode of billing. Web Analytics
offer you more than this. All you need is to analyze things minutely and keep patience.

 Page Load Time

More is the load time, the more is bounce rate. Tracking page load time is equally important.

21 | P a g e
 Behavior
Behavior lets you know page views and time spent on website. You can find out how
customer behaves once he is on your website.

MOBILE BUSINESS INTELLIGENCE

Business Intelligence
“Business Intelligence is not just about turning data into information, rather organizations need that
data to impact how their business operates and responds to the changing marketplace.”
So, it is not all about transforming data into information, though Business Intelligence significantly
involves this process. Business Intelligence is transforming data into meaningful, actionable insights
that enable organizations to make informed business strategies and tactical decisions.

Mobile Business Intelligence

Business Intelligence delivers relevant and trustworthy information to the right person at the right
time. Mobile business intelligence is the transfer of business intelligence from the desktop to
mobile devices such as the BlackBerry, iPad, and iPhone.
The ability to access analytics and data on mobile devices or tablets rather than desktop computers is
referred to as mobile business intelligence. The business metric dashboard and key performance
indicators (KPIs) are more clearly displayed.
With the rising use of mobile devices, so have the technology that we all utilise in our daily lives to
make our lives easier, including business. Many businesses have benefited from mobile business
intelligence. Essentially, this post is a guide for business owners and others to educate them on the
benefits and pitfalls of Mobile BI.

Need for mobile BI?

Mobile phones' data storage capacity has grown in tandem with their use. You are expected to make
decisions and act quickly in this fast-paced environment. The number of businesses receiving
assistance in such a situation is growing by the day.
To expand your business or boost your business productivity, mobile BI can help, and it works with
both small and large businesses. Mobile BI can help you whether you are a salesperson or a CEO.
As a result, timely decision-making can boost customer satisfaction and improve an enterprise's
reputation among its customers. It also aids in making quick decisions in the face of emerging risks.
Data analytics and visualisation techniques are essential skills for any team that wants to organise
work, develop new project proposals, or wow clients with impressive presentations.

Advantages of mobile BI
1. Simple access
Mobile BI is not restricted to a single mobile device or a certain place. You can view your data at
any time and from any location. Having real-time visibility into a firm improves production and the
daily efficiency of the business. Obtaining a company's perspective with a single click simplifies the
process.

22 | P a g e
2. Competitive advantage
Many firms are seeking better and more responsive methods to do business in order to stay ahead of
the competition. Easy access to real-time data improves company opportunities and raises sales
and capital. This also aids in making the necessary decisions as market conditions change.

3. Simple decision-making
As previously stated, mobile BI provides access to real-time data at any time and from any
location. During its demand, Mobile BI offers the information. This assists consumers in obtaining
what they require at the time. As a result, decisions are made quickly.

4. Increase Productivity
By extending BI to mobile, the organization's teams can access critical company data when they
need it. Obtaining all of the corporate data with a single click frees up a significant amount of time
to focus on the smooth and efficient operation of the firm. Increased productivity results in a
smooth and quick-running firm.

Disadvantages of mobile
1. Stack of data
The primary function of a mobile BI is to store data in a systematic manner and then present it to
the user as required. As a result, Mobile BI stores all of the information and does end up with
heaps of earlier data. The corporation only needs a small portion of the previous data, but they
need to store the entire information, which ends up in the stack

2. Expensive
Mobile BI can be quite costly at times. Large corporations can continue to pay for their expensive
services, but small businesses cannot. As the cost of mobile BI is not sufficient, we must
additionally consider the rates of IT workers for the smooth operation of BI, as well as the
hardware costs involved. However, larger corporations do not settle for just one Mobile BI
provider for their organisations; they require multiple. Even when doing basic commercial
transactions, mobile BI is costly.

3 Time consuming
Businesses prefer Mobile BI since it is a quick procedure. Companies are not patient enough to
wait for data before implementing it. In today's fast-paced environment, anything that can produce
results quickly is valuable. The data from the warehouse is used to create the system, hence the
implementation of BI in an enterprise takes more than 18 months.

4 Data breach
The biggest issue of the user when providing data to Mobile BI is data leakage. If you handle
sensitive data through Mobile BI, a single error can destroy your data as well as make it public,
which can be detrimental to your business.
Many Mobile BI providers are working to make it 100 percent secure to protect their potential users'
data. It is not only something that mobile BI carriers must consider, but it is also something that
we, as users, must consider when granting data access authorization.

5 Poor quality data

Because we work online in every aspect, we have a lot of data stored in Mobile BI, which might
be a significant problem. This means that a large portion of the data

23 | P a g e
analysed by Mobile BI is irrelevant or completely useless. This can speed down the entire procedure. This
requires you to select the data that is important and may be required in the future.

Best Mobile BI tools

1. Si Sense

Sisense is a flexible business intelligence (BI) solution that includes powerful analytics,
visualisations, and reporting capabilities for managing and supporting corporate data. Businesses
can use the solution to evaluate large, diverse databases and generate relevant business insights.
You may easily view enormous volumes of complex data with Si Sense's code-first, low-code, and
even no-code technologies. Si Sense was established in 2004 with its headquarters in New York.
Since then, the team has only taken precautionary steps in their investigation. Once the company
had received $ 4 million in funding from investors, they began to pace its research.

2 SAP Roambi analytics

Roambi analytics is a BI tool that offers a solution that allows you to fundamentally rethink your
data analysis, making it easier and faster while also increasing your data interaction.
You can consolidate all of your company's data in a single tool using SAP Roambi Analytics,
which integrates all ongoing systems and data. Use of SAP Roambi analysis is a simple three-step
technique. Upload your html or spreadsheet files first. The information is subsequently transformed
into informative data or graphs, as well as data that may be visualised.
After the data is collected, you may easily share it with your preferred device. Roambi Analytics
was founded in 2008 by a team based in California.

3 Microsoft Power BI pro

Microsoft's strength BI is an easy-to-use tool for all non-technical business owners. who are
unfamiliar with BI tools but wish to aggregate, analyse, visualise, and share data you only need a
basic understanding of Excel and other Microsoft tools, and if you are familiar with these, the
Microsoft BI tool can be used as a self-service tool. Microsoft Power BI has a unique feature that
allows users to create subsets of data and then automatically apply analytics to that information.

4 IBM Cognos Analytics

Cognos Analytics is an IBM-registered web-based business intelligence tool. Cognos Analytics is
now merging with Watsons, and the benefits for users are extremely exciting. Watson cognos
analytics will assist in connecting and cleaning the users' data, resulting in proper visualised data.
That way, the business owner will know where they stand in comparison to their competitors and
where they can grow in the future. It combines reporting, modelling, analysis, dashboards to help
you understand your organization's data and make sound business decisions.

5 Amazon quick sights

Amazon Quick View assists in the creation and distribution of interactive BI dashboards to their
users, as well as the retrieval of answers in natural language queries in seconds. Quick sight can
be accessed through any device embedded in any website, portal, or app.
Amazon Quick Sight allows you to quickly and easily create interactive dashboards and reports
for your users. Anyone in your organisation can securely access those dashboards via
browsers or mobile devices.
Quick sight's eye-catching feature is its pay-per-session model, which allows users to use the
creative dashboard created by another without paying much. The user pays according to the
24 | P a g e
length of the session, with prices ranging from $0.30 for a 30-minute session to $5 for unlimited
use per month per user.

CROWD SOURCING ANALYTICS

Crowdsourcing is a sourcing model in which an individual or an organization gets support
from a large, open-minded, and rapidly evolving group of people in the form of ideas, micro-tasks,
finances, etc. Crowdsourcing typically involves the use of the internet to attract a large group of
people to divide tasks or to achieve a target. The term was coined in 2005 by Jeff Howe and Mark
Robinson. Crowdsourcing can help different types of organizations get new ideas and solutions,
deeper consumer engagement, optimization of tasks, and several other things.
Let us understand this term deeply with the help of an example. Like GeeksforGeeks is giving
young minds an opportunity to share their knowledge with the world by contributing articles, videos
of their respective domain. Here GeeksforGeeks is using the crowd as a source not only to expand
their community but also to include ideas of several young minds improving the quality of the content.

Where Can We Use Crowdsourcing?

Crowdsourcing is touching almost all sectors from education to health. It is not only accelerating
innovation but democratizing problem-solving methods. Some fields where crowdsourcing can
be used.

1. Enterprise
2. IT Marketing
3. Education
4. Finance
5. Science and Health

How to Crowdsource?
1. For scientific problem solving, a broadcast search is used where an organization mobilizes a
crowd to come up with a solution to a problem.
2. For information management problems, knowledge discovery and
management is used to find and assemble information.
3. For processing large datasets, distributed human intelligence is used. The organization
mobilizes a crowd to process and analyze the information.

Examples of Crowdsourcing
1. Doritos: It is one of the companies which is taking advantage of crowdsourcing for a long
time for an advertising initiative. They use consumer-created ads for one of their 30-Second
Super Bowl Spots(Championship Game of Football).
2. Starbucks: Another big venture which used crowdsourcing as a medium for idea generation.
Their white cup contest is a famous contest in which customers need to decorate their
Starbucks cup with an original design and then take a photo and submit it on social media.
3. Lays:” Do us a flavor” contest of Lays used crowdsourcing as an idea-generating medium. They
asked the customers to submit their opinion about the next chip flavor they want.
4. Airbnb: A very famous travel website that offers people to rent their houses or apartments by
listing them on the website. All the listings are crowdsourced by people.

25 | P a g e
Crowdsourced Marketing
As discussed already crowdsourcing helps grow businesses grow a lot. May it be a business idea
or just a logo design, crowdsourcing engages people directly and in turn, saves money and
energy. In the upcoming years, crowdsourced marketing will surely get a boost as the world is
accepting technology faster.

Main Types of Crowdsourcing

Crowdsourcing involves obtaining information or resources from a wide swath of people. In
general, we can break this up into four main categories:
 Wisdom - Wisdom of crowds is the idea that large groups of people are collectively smarter
than individual experts when it comes to problem-solving or identifying values (like the
weight of a cow or number of jelly beans in a jar).
 Creation - Crowd creation is a collaborative effort to design or build something. Wikipedia
and other wikis are examples of this. Open-source software is another good example.
 Voting - Crowd voting uses the democratic principle to choose a particular policy or course
of action by "polling the audience."
 Funding - Crowdfunding involved raising money for various purposes by soliciting relatively
small amounts from a large number of funders.

Crowdsourcing Sites
Here is the list of some famous crowdsourcing and crowdfunding sites.

1. Kickstarter
2. GoFundMe
3. Patreon
4. RocketHub

Advantages of Crowdsourcing
1. Evolving Innovation: Innovation is required everywhere and in this advancing world
innovation has a big role to play. Crowdsourcing helps in getting innovative ideas from
people belonging to different fields and thus helping businesses grow in every field.
2. Save costs: There is the elimination of wastage of time of meeting people and convincing
them. Only the business idea is to be proposed on the internet and you will be flooded with
suggestions from the crowd.
3. Increased Efficiency: Crowdsourcing has increased the efficiency of business models as
several expertise ideas are also funded.

Disadvantages of Crowdsourcing
1. Lack of confidentiality: Asking for suggestions from a large group of people can bring the
threat of idea stealing by other organizations.
2. Repeated ideas: Often contestants in crowdsourcing competitions submit repeated, plagiarized
ideas which leads to time wastage as reviewing the same ideas is not worthy.

26 | P a g e
INTER AND TRANS FIREWALL ANALYTICS

27 | P a g e
28 | P a g e
Inter-firewall analytics
 Focus: Analyzes traffic flows between different firewalls within a network.
 Methodology: Utilizes data collected from multiple firewalls to identify anomalies
and potential breaches.
 Benefits: Provides a comprehensive view of network traffic flow and helps identify
lateral movement across different security zones.
 Limitations: Requires deployment of multiple firewalls within the network and efficient data
exchange mechanisms between them.

29 | P a g e
30 | P a g e
31 | P a g e
Trans-firewall analytics
 Focus: Analyzes encrypted traffic that traverses firewalls, which traditional security solutions may not be
able to decrypt and inspect.
 Methodology: Uses deep packet inspection (DPI) and other advanced techniques to analyze the content
of encrypted traffic without compromising its security.
 Benefits: Provides insight into previously hidden threats within encrypted traffic and helps detect
sophisticated attacks.
 Limitations: Requires specialized hardware and software solutions for DPI, and raises concerns regarding
potential data privacy violations.

Choosing the right approach

The choice between inter-firewall and trans-firewall analytics depends on several factors, including:
 Network size and complexity: Larger and more complex networks benefit

more from inter-firewall analytics for comprehensive monitoring.

Security needs and threats: Trans-firewall analytics is crucial for networks handling
sensitive data and facing advanced threats.

Unit 1 BDA
No ratings yet
Unit 1 BDA
38 pages
Unit 1 - Understanding Big Data
No ratings yet
Unit 1 - Understanding Big Data
39 pages
UNIT - 1NOTES - To Print
No ratings yet
UNIT - 1NOTES - To Print
21 pages
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
No ratings yet
Big Data Analytics - CCS334 - Notes - ALL UNITS NOTES
130 pages
Unit 1 - Big Data Analytics - CCS334
No ratings yet
Unit 1 - Big Data Analytics - CCS334
35 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
No ratings yet
Big Data Analytics - CCS334 - Notes - Unit 1 - Understanding Big Data
40 pages
Unit 1 Big Data Notes
No ratings yet
Unit 1 Big Data Notes
40 pages
Ccs334 Big Data Analytics
No ratings yet
Ccs334 Big Data Analytics
69 pages
BDM 1
No ratings yet
BDM 1
37 pages
Ccs334 Big Data Analytics
No ratings yet
Ccs334 Big Data Analytics
49 pages
UNIT I BIG DATA Extra Content
No ratings yet
UNIT I BIG DATA Extra Content
15 pages
Unit 1 Notes Bda
No ratings yet
Unit 1 Notes Bda
20 pages
Big Data
No ratings yet
Big Data
13 pages
Big Data Analytics Notes
No ratings yet
Big Data Analytics Notes
115 pages
BDA Notes
No ratings yet
BDA Notes
35 pages
Big Data Analysis by Deshbandhu
No ratings yet
Big Data Analysis by Deshbandhu
368 pages
Module 1
No ratings yet
Module 1
14 pages
Unit 1 - ETI (BDA)
No ratings yet
Unit 1 - ETI (BDA)
20 pages
Big Data Insights for Businesses
No ratings yet
Big Data Insights for Businesses
13 pages
Unit 1 (Chapter 1) - Introduction
No ratings yet
Unit 1 (Chapter 1) - Introduction
10 pages
Big Data Analytics - Unit 1
No ratings yet
Big Data Analytics - Unit 1
29 pages
1 Bda
No ratings yet
1 Bda
41 pages
Unit 1
No ratings yet
Unit 1
56 pages
Big Data
No ratings yet
Big Data
28 pages
Unit 4
No ratings yet
Unit 4
29 pages
BD Unit 1
No ratings yet
BD Unit 1
63 pages
Extracted Note For Big Data - 070659
No ratings yet
Extracted Note For Big Data - 070659
79 pages
Big Data Analytics
No ratings yet
Big Data Analytics
23 pages
Big Data Notes UNIT-1
No ratings yet
Big Data Notes UNIT-1
14 pages
Attachment
No ratings yet
Attachment
10 pages
Introduction To Big Data
No ratings yet
Introduction To Big Data
11 pages
Big Data and Data Analytics
No ratings yet
Big Data and Data Analytics
6 pages
BDA Unit 1
No ratings yet
BDA Unit 1
28 pages
CC&BD Unit 3
No ratings yet
CC&BD Unit 3
16 pages
BDA Notes
No ratings yet
BDA Notes
96 pages
12.3 Big Data: Prepared By: Mohammad Nabeel Arshad
No ratings yet
12.3 Big Data: Prepared By: Mohammad Nabeel Arshad
57 pages
Apache Hadoop Training For Developers Day 1
No ratings yet
Apache Hadoop Training For Developers Day 1
136 pages
Emerging Big Data and Cloud Computing
No ratings yet
Emerging Big Data and Cloud Computing
15 pages
Unit 1
No ratings yet
Unit 1
63 pages
Bigdata Writing
No ratings yet
Bigdata Writing
11 pages
Unit 3 Big Data Analytics
No ratings yet
Unit 3 Big Data Analytics
18 pages
Introduction To Big Data - Report 1
No ratings yet
Introduction To Big Data - Report 1
5 pages
Chapter 1
No ratings yet
Chapter 1
21 pages
BDA Unit 1 Notes
No ratings yet
BDA Unit 1 Notes
6 pages
Bda Aiml Note Unit 1
No ratings yet
Bda Aiml Note Unit 1
14 pages
Unit-1 Notes
No ratings yet
Unit-1 Notes
30 pages
Unit 1
No ratings yet
Unit 1
54 pages
14 Big Data
No ratings yet
14 Big Data
39 pages
Big Data Analytics
No ratings yet
Big Data Analytics
127 pages
Introduction to Big Data Analytics
No ratings yet
Introduction to Big Data Analytics
30 pages
The Three Vs of Big Data
No ratings yet
The Three Vs of Big Data
4 pages
Difference Between Dos and Windows
No ratings yet
Difference Between Dos and Windows
2 pages
Computer Engineering Thesis Proposal
100% (3)
Computer Engineering Thesis Proposal
8 pages
L293D Motor Driver Guide
No ratings yet
L293D Motor Driver Guide
6 pages
Sistema de Deteccion de Voltaje - Electronsistems
No ratings yet
Sistema de Deteccion de Voltaje - Electronsistems
5 pages
CCIE Practical Exam Format
No ratings yet
CCIE Practical Exam Format
4 pages
Tk730-User Guide-Gotop-Gps Tracker
No ratings yet
Tk730-User Guide-Gotop-Gps Tracker
4 pages
Thr81d en Col82
No ratings yet
Thr81d en Col82
16 pages
One UI Samsung US
No ratings yet
One UI Samsung US
1 page
ICSA 311 Functional Security Assessment For Components (v2 - 3)
No ratings yet
ICSA 311 Functional Security Assessment For Components (v2 - 3)
62 pages
1505223132foxboro - Idp10s - Brochure
No ratings yet
1505223132foxboro - Idp10s - Brochure
4 pages
Training Catalogue - EN - Low
No ratings yet
Training Catalogue - EN - Low
12 pages
Read The Entire Specification Before You Begin Working On This Project!
No ratings yet
Read The Entire Specification Before You Begin Working On This Project!
5 pages
Mobile Based Network Monitoring System
67% (3)
Mobile Based Network Monitoring System
9 pages
Curriculum Vitae Ku Mohamad Sufi
No ratings yet
Curriculum Vitae Ku Mohamad Sufi
1 page
iOS-Using-SwiftUI-with-our-existing-UIKit-codebase-Proposal 2
No ratings yet
iOS-Using-SwiftUI-with-our-existing-UIKit-codebase-Proposal 2
6 pages
Bca 1 Sem P.C. Packages 7025 Jan 2019
No ratings yet
Bca 1 Sem P.C. Packages 7025 Jan 2019
4 pages
(CATALOG) ULTRA 100HF - Veterinary - Small
No ratings yet
(CATALOG) ULTRA 100HF - Veterinary - Small
3 pages
DSM - Mk6es - Hardware Reference Manual.U10.2
No ratings yet
DSM - Mk6es - Hardware Reference Manual.U10.2
34 pages
4-3 Directional Spool Valve 4WE6E
No ratings yet
4-3 Directional Spool Valve 4WE6E
2 pages
UH300 Series HMI Specs & Features
No ratings yet
UH300 Series HMI Specs & Features
2 pages
Kem 5641 by
No ratings yet
Kem 5641 by
6 pages
Linux Mint 18.3 Sylvia Guide for Windows Users
No ratings yet
Linux Mint 18.3 Sylvia Guide for Windows Users
34 pages
Stellarisware Release Notes: Sw-Rln-6852
No ratings yet
Stellarisware Release Notes: Sw-Rln-6852
160 pages
Converting Disks To HDF
No ratings yet
Converting Disks To HDF
9 pages
EE2003-E03 Operational Amplifier
No ratings yet
EE2003-E03 Operational Amplifier
6 pages
Aos Question Bank
No ratings yet
Aos Question Bank
12 pages
Quiz, Application Letter - Resume
No ratings yet
Quiz, Application Letter - Resume
4 pages
Module 2 - Introduction To Business
No ratings yet
Module 2 - Introduction To Business
4 pages
As 125905 SR-X Um B95GB WW GB 2022 2
No ratings yet
As 125905 SR-X Um B95GB WW GB 2022 2
162 pages
The Segmentation of Innovation - How Digital Design Rapid Prototy
No ratings yet
The Segmentation of Innovation - How Digital Design Rapid Prototy
21 pages

Unit 1 Bda

Uploaded by

Unit 1 Bda

Uploaded by

CCS334

BIG DATA ANALYTICS

INTRODUCTION TO BIG DATA

What is Big Data

Sources of Big Data

How does big data work?

What is big data analytics?

Types of big data analytics

Benefits of big data analytics

Types of Big Data

Unstructured data vs. structured data

Storage Stored in relational databases (RDBMS) Cannot be stored in an RDBMS

Other Name Relational data —

Easy to search using human-defined queries or Challenging to search; requires

Processing & Difficult for conventional software to

Comes in many formats (text, images,

Customer interactions, rich media,

Limited support; robust tools are only

What are some examples of unstructured data?

Characteristics of Unstructured Data:

Sources of Unstructured Data:

Advantages of Unstructured Data:

Disadvantages of Unstructured data:

Problems faced in storing unstructured data:

Possible solution for storing Unstructured data:

Extracting information from unstructured Data:

APPLICATIONS OF BIG DATA

1. Tracking Customer Spending Habit, Shopping Behavior:

3. Smart Traffic System:

4. Secure Air Traffic System:

5. Auto Driving Car:

6. Virtual Personal Assistant Tool:

10. Media and Entertainment Sector:

BIG DATA TECHNOLOGIES

 Free to use and offers an efficient storage solution for businesses.

Features of APACHE Cassandra:

Features of APACHE Spark:

Features of Rapid Miner:

Advantages of Big Data :

Disadvantages of Big Data :

Advantages of Cloud Computing :

isadvantages of Cloud Computing:

Difference between Big Data and Cloud Computing:

S.No. BIG DATA CLOUD COMPUTING

Cloud Computing Services includes

Volume of data, Velocity of data, Variety of On-Demand availability of IT resources, broad

The purpose of big data is to organizing the

Distributed computing is used for analyzing

Big data management allows centralized

Cloud computing is used to store data and

Some of the cloud computing vendors who

Web analytics focuses on various issues. For example,

Importance of Web Analytics

Web Analytics Process

There are two types of web analytics −

 On-site − It measures the users’ behaviour once it is on the website. For

Metrics of Web Analytics

Key Performance Indicator (KPI)

Micro and macro Level Data Insights

Web Analysis - What to Measure?

 Page Load Time

MOBILE BUSINESS INTELLIGENCE

Mobile Business Intelligence

Need for mobile BI?

5 Poor quality data

Best Mobile BI tools

2 SAP Roambi analytics

3 Microsoft Power BI pro

4 IBM Cognos Analytics

5 Amazon quick sights

CROWD SOURCING ANALYTICS

Where Can We Use Crowdsourcing?

Main Types of Crowdsourcing

Choosing the right approach

more from inter-firewall analytics for comprehensive monitoring.

You might also like