Data (Big)
Contents
• Classification of Digital Data
• Big Data
• Big Data Technology Landscape
Classification of Digital Data
The Big Data
• Big data is high-volume, high-velocity and/or high-variety
information assets that demand cost-effective, innovative forms
of information processing that enable enhanced insight,
decision-making, and process automation.
Source: Gartner IT Glossary
Many more V’s
• Veracity
• Validity
• Variability
• Viscosity & Volatility…….
The Big Data Technology landscape
• NoSQL
• Hadoop
What is No SQl data base
❖ It’s Not No SQL it’s NOT ONLY SQL.
❖ It’s not even a replacement to RDBMS.
❖ As compared to the good olden days we are saving more and more
data.
❖ Connection between the data is growing in which we require an
architecture that takes advantage of these two key issues.
What is NoSQL?
• Non Relational data storage system
• No Fixed table schema
• Distributed
• Relaxes one or more ACID Properties
Types of No SQl data base *
• Document Based
• Key Value pair (#key,#value)
[
{
"Name": "Tom",
Dynamo DB (Name, Tom) "Age": 30, Mongo Db
"Role": "Student",
(Age,25) "University": "CU", AmazonSimple DB
Azure Table Storage (ATS )
(Role, Student)
} Couch DB
(University, CU) ]
Column Oriented database
• Graph database
Tom Masters Row Id Columns
Na
me
Student Name Tom Bigtable(Google)
es
rs
u
Co
e • Neo4j 1 Age 25 Hbase, Maria DB, Cra
Ag CU
25 • Infogrid
Ottawa Location Role Student
Motivations
❖ Problems with SQL
❖ Rigid schema
❖ Not easily scalable (designed for 90’s technology or worse)
❖ Requires unintuitive joins
❖ Perks of mongoDB
❖ Easy interface with common languages (Java, Javascript, PHP, etc.)
❖ DB tech should run anywhere (VM’s, cloud, etc.)
❖ Keeps essential features of RDBMS’s while learning from key-value
noSQL systems
http://www.slideshare.net/spf13/mongodb-9794741?v=qf1&b=&from_search=13
Advantages of NoSQL
• Can easily scale up
• Doesn’t require a pre-define schema
• Cheap, easy to implement
• Relaxes data consistency requirement
• Data can be replicated to multiple nodes and can be partitioned
SQL vs NoSQL
• Relational Database • Non-relational, distributed database
• Relational model • Model-less approach
• Pre-defined schema • Dynamic schema for unstructured data
• Document-based or Graph based or
• Table-based schema wide column store or key-value pairs
database
• Horizontally Scalable
• Vertically scalable • largely preferred for large datasets
• Not preferred for larger datasets • Emphasis CAP theorem
• Emphasis on ACID properties • MongoDB,Hbase,Cassandra,Redis,Neo4
• Eg Oracle, DB2,MySQL etc. j,CouchDBetc.
New SQL
• A database that has the same scalable performance of NoSQL
systems while still maintaining the ACID guarantees of traditional
database. This new modern RDBMS is called NewSQL.
Characteristics of NewSQL
• SQL interface for application interaction
• ACID support for transactions
• An architecture that provides higher per node performance vis-à-vis
traditional RDBMS solution
• Scale out easily
• Non-locking concurrency control mechanism so that real time reads
will not conflict with writes
Comparison of SQL,NoSQL and NewSQl