INTRODUCTION TO BIG DATA
A. Define Data
Data is a collection of raw, unorganised facts and details like text, observations, figures,
symbols and descriptions of things etc. In other words, data does not carry any specific
purpose and has no significance by itself. Moreover, data is measured in terms of bits and
bytes – which are basic units of information in the context of computer storage and
processing.
The main examples pf data are phone numbers, weights, prices, cost, numbers of items sold,
product names, addresses, registration marks etc.
In terms of computing, data is defined as a collection of facts that have been translated into a
form that is more convenient to analyze or to do further calculations.
Data can be represented in the form of:
1. Instruction in computer language can be either numeric, alphabetic or alphanumeric
2. Images, pictures, video segments, multimedia and animated data
Some examples of data:
2221023, 2120570, 3109007, etc. numbers can be used to represent telephone number data
COLGATE, REXONA etc. words can be used to represent product name data.
Pictures of animals, people can be used to represent image data
Information is processed, organised and structured data. It provides context for data and
enables decision making. For example, a single customer’s sale at a restaurant is data – this
becomes information when the business is able to identify the most popular or least popular
dish.
B. What is Big Data?
Big data refers to data sets that are too large or complex to be dealt with by traditional data-
processing application software.
Big data is a combination of structured, semistructured and unstructured data collected by
organizations that can be mined for information and used in machine learning projects,
predictive modeling and other advanced analytics applications.
Systems that process and store big data have become a common component of data management an
architectures in organisations. Big data is often characterised by the 3Vs: the large volume of data in
many environments, the wide variety of data types stored in big data systems and the velocity at which
the data is generated, collected and processed. These characteristics were first identified by Doug
Laney, then an analyst at Meta Group Inc. in 2001; Gartner further popularised them after it acquired
Meta Group in 2005. More recently, several other Vs have been added to different descriptions of big
data, including veracity, value and variability.
Although big data doesn’t equate to any specific volume of data, big data deployments often involve
terabytes (TB), petabytes (PB) and even exabytes (EB) of data captured over time.
Importance of big data
Companies use the big data accumulated in their systems to improve operations, provide better
customer service, create personalized marketing campaigns based on specific customer preferences
and, ultimately, increase profitability. Businesses that utilize big data hold a potential compective
advantage over those that don’t since they’re able to make faster and more informed business
decisions, provided they use the data effectively.
For example, big data can provide companies with valuable insights into their customers that can be
used to refine marketing campaigns and techniques in order to increase customer engagement and
conversion rates.
Furthermore, utilizing big data enables companies to become increasingly customer centric. Historical
and real-time data can be used to assess the evolving preferences of consumers, consequently enabling
businesses to update and improve their marketing strategies and become more responsive to customer
desires and needs.
Big data is also used by medical researchers to identify disease risk factors and by doctors to help
diagnose illnesses and conditions in individual patients. In addition, data derived from electronic health
records (EHRs), social media, the web and other sources provides healthcare organizations and
government agencies with up-to-the-minute information on infectious disease threats or outbreaks.
In the energy industry, big data helps oil and gas companies identify potential drilling locations and
monitor pipeline operations; likewise, utilities use it to track electrical grids. Financial services firms use
big data systems for risk management and real-time analysis of market data. Manufacturers and
transportation companies rely on big data to manage their supply chains and optimize delivery routes.
Other government uses include emergency response, crime prevention and smart city initiatives.
All data is not created equal. Some data is structured, but most of it is unstructured. Structured
and unstructured data is sourced, collected and scaled in different ways, and each one resides
in a different type of database.
Types of Data
Structured vs. Unstructured Data: What’s the Difference? | Trifacta
Data Types and Applications: Structured vs Unstructured Data (levity.ai)
AI Unleashes the Power of Unstructured Data (cio.com)
Structured
Structured data is easily detectable via search because it is highly organized
information. It uploads neatly into a relational database (think traditional row
database structures) and lives in fixed fields. It’s the data that most of us are
used to working with in order to analyze largely quantitative problems—think
“how many products have been sold this quarter” or “how many customers
have subscribed to the monthly newsletter,” for example.
Non-Structured
Semi-structured
Why Is Big Data Important?
Big Data: What it is and why it matters | SAS
What is Database - Introduction, Meaning, Definition, Features (edutzar.in)
2. What is data? Different types of data? Structured | Semi-structured | Unstructured data - YouTube
What is Big Data | Big Data Types | Types of Data | Structured Data | Unstructured Data | - YouTube
Big Data Tutorial For Beginners | What Is Big Data | Big Data Tutorial | Hadoop Training | Edureka -
YouTube
What is big data and why is it important? | by Raghav Sharma | Medium
Structured vs. Unstructured Data: What’s the Difference? | IBM
Samples of big data:
Top Big Data Technologies: Transform Your Business [2022] (doit.software)
Types Of Big Data: Simplified (2022) (jigsawacademy.com)