0% found this document useful (0 votes)

13 views11 pages

Unit 2 (Big Data Analytics)

The document provides an overview of NoSQL databases, highlighting their non-relational nature, scalability, and use in big data applications. It discusses various data models, including key-value, document, and graph databases, as well as concepts like aggregation, distribution models, and consistency models. Additionally, it covers practical aspects of using Cassandra, including creating tables, reading data, and inserting data from CSV files.

Uploaded by

navata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

0% found this document useful (0 votes)

13 views11 pages

Unit 2 (Big Data Analytics)

Uploaded by

navata

We take content rights seriously. If you suspect this is your content, claim it here.

Available Formats

Download as PDF, TXT or read online on Scribd

You are on page 1/ 11

Unit 2(Big Data Analytics) (Search with  for new topic)

Introduction to NoSQL

NoSQL Database is a non-relational Data Management System, that does not require a fixed schema. It
avoids joins, and is easy to scale. The major purpose of using a NoSQL database is for distributed data
stores with humongous data storage needs. NoSQL is used for Big data and real-time web apps. For
example, companies like Twitter, Facebook and Google collect terabytes of user data every single day.

aggregate data models

A data model is the model through which we perceive and manipulate our data. For people using a
database, the data model describes how we interact with the data in the database. This is distinct from a
storage model, which describes how the database stores and manipulates the data internally. In an ideal
world, we should be ignorant of the storage model, but in practice we need at least some inkling of it—
primarily to achieve decent performance.

In conversation, the term “data model” often means the model of the specific data in an application. A
developer might point to an entity-relationship diagram of their database and refer to that as their data
model containing customers, orders, products, and the like

aggregates

Aggregation of NoSql datasets is an important feature in many applications. restdb.io supports queries
with both grouping and aggregation of data sets. This is very helpful in developing custom reports, visual
charts, data analysis etc. The table below shows all aggregation and grouping functions:

Function Format Comment Example

Min MIN:field Returns object h={"$aggregate":["MIN:score"]}

Max MAX:field Returns object h={"$aggregate":["MAX:score"]}

Avg AVG:field Returns value h={"$aggregate":["AVG:score"]}

Sum SUM:field Returns value h={"$aggregate":["SUM:score"]}

Returns value with

Count COUNT:property h={"$aggregate":["COUNT:nplayers"]}
chosen property name

Groupby $groupby: ["field", Returns h={"$groupby":["category"]}

Page 1 of 11
Function Format Comment Example

...] "groupkey":[array]

Predefined values for:

Groupby $groupby:
$YEAR, $MONTH, h={"$groupby":["$YEAR:registered"]}
(dates) ["$YEAR:field", ...]
$DAY, $HOUR, $SEC

Groupby Format strings for: ss,

$groupby:
(dates hh, mm, dd, MM, YY.
["$DATE:format", h={"$groupby":["$DATE.MMM:registered"]}
with All formats
...]
formats) at momentjs.com

Grand $aggregate-grand- Recursive aggregation h={"$groupby":["category"], "$aggregate":

totals total: true functions of groups ["AVG:score"], "$aggregate-grand-total": true}

key-value
A key-value data model or database is also referred to as a key-value store. It is a non-relational type
of database. In this, an associative array is used as a basic database in which an individual key is linked
with just one value in a collection. For the values, keys are special identifiers. Any kind of entity can be
valued. The collection of key-value pairs stored on separate records is called key-value databases and
they do not have an already defined structure.

How do key-value databases work?

A number of easy strings or even a complicated entity are referred to as a value that is associated
with a key by a key-value database, which is utilized to monitor the entity. Like in many programming
paradigms, a key-value database resembles a map object or array, or dictionary, however, which is
put away in a tenacious manner and controlled by a DBMS.

Page 2 of 11
An efficient and compact structure of the index is used by the key
key-value
value store to have
h the option to
rapidly and dependably find value using its key. For example, Redis is a key key-value
value store used to
tracklists, maps, heaps, and primitive types (which are simple data structures) in a constant database.
Redis can uncover a very basic point of interaction to query and manipulate value types, just by
supporting a predetermined number of value types, and when arranged, is prepared to do high
throughput.

document data models

A data file in the form of a document rather than a relational table. Document models are more free
form compared to the rows and columns of the relational model. See XML, JSON, DOM,
DOM relational
database and MongoDB.

relationships

relationships
elationships are associations between different collections in a database. You can
create relationships and define its object properties for NoSQL databases using either of the following
methods:
 Embedding
Embeds the related data in collections into a single or multiple structured collections
 Referencing
Relates the data in multiple collections as an identifying or non
non-identifying relationships

graph databases

A graph database is defined as a specialized, single

single-purpose
purpose platform for creating and manipulating
graphs. Graphs contain nodes, edges, and properties, all of which are used to represent and store
data in a way that relational databases are not equipped to do.

schema less databases

Page 3 of 11
A schemaless database makes almost no changes to your data; each item is saved in its own document
with a partial schema, leaving the raw information untouched. This means that every detail is always
available and nothing is stripped to match the current schema. This is particularly valuable if your
analytics needs to change at some point in the future.

materialized views

A materialized view is a pre-computed data set derived from a query specification (the SELECT
in the view definition) and stored for later use. Because the data is pre-computed, querying a
materialized view is faster than executing a query against the base table of the view. This
performance difference can be significant when a query is run frequently or is sufficiently
complex. As a result, materialized views can speed up expensive aggregation, projection, and
selection operations, especially those that run frequently and that run on large data sets.

distribution models

Aggregate oriented databases make distribution of data easier, since the distribution
mechanism has to move the aggregate and not have to worry about related data, as all the
related data is contained in the aggregate. There are two styles of distributing data:

 Sharding: Sharding distributes different data across multiple servers, so each server acts
as the single source for a subset of data.
 Replication: Replication copies data across multiple servers, so each bit of data can be
found in multiple places. Replication comes in two forms,
 Master-slave replication makes one node the authoritative copy that handles
writes while slaves synchronize with the master and may handle reads.
 Peer-to-peer replication allows writes to any node; the nodes coordinate to
synchronize their copies of the data.

Master-slave replication reduces the chance of update conflicts but peer-to-peer replication
avoids loading all writes onto a single server creating a single point of failure. A system may use
either or both techniques. Like Riak database shards the data and also replicates it based on the
replication factor.

 Consistency Models
In the past, almost all architectures used in databases systems were strong consistent. In
these cases, most architectures would have a single database instance only responding to a few
hundred clients. Nowadays, many systems are accessed by hundreds of thousands of clients, so
there was a mandatory requirement to system’s architectures that scale. However, considering
the CAP theorem, high-availability and consistency do conflict on distributed systems when
subject to a network partition event. The majority of the projects that have been experiencing

Page 4 of 11
such high-traffic have chosen to adopt high-availability over a strong consistent architecture by
relaxing the consistency level.

Version Stamps

Many critics of NoSQL databases focus on the lack of support for transactions. Transactions are
a useful tool that helps programmers support consistency. One reason why many NoSQL
proponents worry less about a lack of transactions is that aggregate-oriented NoSQL databases
do support atomic updates within an aggregate—and aggregates are designed so that their
data forms a natural unit of update. That said, it’s true that transactional needs are something
to take into account when you decide what database to use.

As part of this, it’s important to remember that transactions have limitations. Even within a
transactional system we still have to deal with updates that require human intervention and
usually cannot be run within transactions because they would involve holding a transaction
open for too long. We can cope with these using version stamps—which turn out to be handy
in other situations as well, particularly as we move away from the single-server distribution
model.

Page 5 of 11
Cassandra Create Table
In Cassandra, CREATE TABLE command is used to create a table. Here, column family is used to
store data just like table in RDBMS.

So, you can say that CREATE TABLE command is used to create a column family in Cassandra.

Syntax:

1. CREATE (TABLE | COLUMNFAMILY) <tablename>

2. ('<column-definition>' , '<column-definition>')
3. (WITH <option> AND <option>)

For declaring a primary key:

1. CREATE TABLE tablename(

2. column1 name datatype PRIMARYKEY,
3. column2 name data type,
4. column3 name data type.
5. )

You can also define a primary key by using the following syntax:

1. Create table TableName

2. (
3. ColumnName DataType,
4. ColumnName DataType,
5. ColumnName DataType
6. .
7. .
8. .
9. Primary key(ColumnName)
10. ) with PropertyName=PropertyValue;

There are two types of primary keys:

o Single primary key: Use the following syntax for single primary key.

Page 6 of 11
1. Primary key (ColumnName)
o Compound primary key: Use the following syntax for single primary key.

1. Primary key(ColumnName1,ColumnName2 . . .)

Example:

Let's take an example to demonstrate the CREATE TABLE command.

Here, we are using already created Keyspace "javatpoint".

1. CREATE TABLE student(

2. student_id int PRIMARY KEY,
3. student_name text,
4. student_city text,
5. student_fees varint,
6. student_phone varint
7. );

Cassandra - Read Data

Reading Data using Select Clause
SELECT clause is used to read data from a table in Cassandra. Using this clause, you can read a
whole table, a single column, or a particular cell. Given below is the syntax of SELECT clause.
SELECT FROM <tablename>

Example

Assume there is a table in the keyspace named emp with the following details −

emp_id emp_name emp_city emp_phone emp_sal

1 ram Hyderabad 9848022338 50000

2 robin null 9848022339 50000

Page 7 of 11
3 rahman Chennai 9848022330 50000

4 rajeev Pune 9848022331 30000

The following example shows how to read a whole table using SELECT clause. Here we are
reading a table called emp.
cqlsh:tutorialspoint> select * from emp;

emp_id | emp_city | emp_name | emp_phone | emp_sal

--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | null | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | 50000
4 | Pune | rajeev | 9848022331 | 30000

(4 rows)

Reading Required Columns

The following example shows how to read a particular column in a table.
cqlsh:tutorialspoint> SELECT emp_name, emp_sal from emp;

(4 rows)

Where Clause
Using WHERE clause, you can put a constraint on the required columns. Its syntax is as follows
−
SELECT FROM <table name> WHERE <condition>;
Note − A WHERE clause can be used only on the columns that are a part of primary key or have
a secondary index on them.
In the following example, we are reading the details of an employee whose salary is 50000.
First of all, set secondary index to the column emp_sal.

Page 8 of 11
cqlsh:tutorialspoint> CREATE INDEX ON emp(emp_sal);
cqlsh:tutorialspoint> SELECT * FROM emp WHERE emp_sal=50000;

emp_id | emp_city | emp_name | emp_phone | emp_sal

--------+-----------+----------+------------+---------
1 | Hyderabad | ram | 9848022338 | 50000
2 | null | robin | 9848022339 | 50000
3 | Chennai | rahman | 9848022330 | 50000

Inserting data using a CSV file in Cassandra

If you want to store data in bulk then inserting data from a CSV file is one of the nice ways. If
you have data in a file so, you can directly insert your data into the database by using the COPY
command in Cassandra. It will be very useful when you have a very large database, and you
want to store data quickly and your data is in a CSV file then you can directly insert your data.
Syntax –
You can see the COPY command syntax for your reference as follows.
COPY table_name [( column_list )]
FROM 'file_name path'[, 'file2_name path', ...] | STDIN
[WITH option = 'value' [AND ...]]
Now, let’s create the sample data for implementing the approach.
Step-1 :
Creating keyspace – data
Here, you can use the following cqlsh command to create the keyspace as follows.
CREATE KEYSPACE data
WITH REPLICATION = {
'class' : 'NetworkTopologyStrategy',
'datacenter1' : 1
};
Step-2 :
Creating the Student_personal_data table –
Here, you can use the following cqlsh command to create the Student_personal_data table as
follows.
CREATE TABLE data.Student_personal_data (
S_id UUID PRIMARY KEY,
S_firstname text,
S_lastname text,
);
Step-3 :
Creating the CSV file –
Consider the following given table as a CSV file namely as personal_data.csv. But, in actual you
can insert data in CSV file and save it in your computer drive.
S_id(UUID) S_firstname S_lastname

Page 9 of 11
S_id(UUID) S_firstname S_lastname

e1ae4cf0-d358-4d55-b511-85902fda9cc1 Ashish christopher

e2ae4cf0-d358-4d55-b511-85902fda9cc2 Joshua D

e3ae4cf0-d358-4d55-b511-85902fda9cc3 Ken N

e4ae4cf0-d358-4d55-b511-85902fda9cc4 Christine christopher

e5ae4cf0-d358-4d55-b511-85902fda9cc5 Allie K

e6ae4cf0-d358-4d55-b511-85902fda9cc6 Lina M

Step-4 :
Inserting data from the CSV file –
In this, you will see how you can insert data into the database from existed CSV file you have,
and you can use the following cqlsh command as follows.
COPY data.Student_personal_data (S_id, S_firstname, S_lastname)
FROM 'personal_data.csv'
WITH HEADER = TRUE;
Step-5 :
Verifying the result –
Once you will execute the above command, then you will get the following result as follows.
Using 7 child processes

Starting copy of data.Student_personal_data with columns [S_id, S_firstname, S_lastname].

Processed: 6 rows; Rate: 10 rows/s; Avg. rate: 14 rows/s
6 rows imported from 1 files in 0.422 seconds (0 skipped).
You can use the following command to see the output as follows.
select * from data.Student_personal_data;
Output :
S_id S_firstname S_lastname

e5ae4cf0-d358-4d55-b511-85902fda9cc5 Allies K

e6ae4cf0-d358-4d55-b511-85902fda9cc6 Lina M

Page 10 of 11
S_id S_firstname S_lastname

e2ae4cf0-d358-4d55-b511-85902fda9cc2 Joshua D

e1ae4cf0-d358-4d55-b511-85902fda9cc1 Ashish christopher

e3ae4cf0-d358-4d55-b511-85902fda9cc3 Ken N

e4ae4cf0-d358-4d55-b511-85902fda9cc4 Christine christopher

Page 11 of 11

NoSQL Databases for Tech Enthusiasts
No ratings yet
NoSQL Databases for Tech Enthusiasts
33 pages
NoSQL Databases
No ratings yet
NoSQL Databases
20 pages
Full Stack UNIT3
No ratings yet
Full Stack UNIT3
57 pages
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
No ratings yet
Cs 620 / Dasc 600 Introduction To Data Science & Analytics: Lecture 6-Nosql
31 pages
Unit Ii - Nosql Databases
No ratings yet
Unit Ii - Nosql Databases
112 pages
NoSQL Notes
No ratings yet
NoSQL Notes
11 pages
BIG Data 2
No ratings yet
BIG Data 2
18 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
15 pages
DBMS Lecture13 NoSQL
No ratings yet
DBMS Lecture13 NoSQL
31 pages
NoSql 2024 Assign2
No ratings yet
NoSql 2024 Assign2
189 pages
Chapter14 BigData&NoSQLDatabases
No ratings yet
Chapter14 BigData&NoSQLDatabases
39 pages
Lecture 1 - NoSQL
No ratings yet
Lecture 1 - NoSQL
31 pages
Introduction To Nosql: - Key Value Databases
No ratings yet
Introduction To Nosql: - Key Value Databases
14 pages
Graph Databases: Key Points: 1. Definition & Basics
No ratings yet
Graph Databases: Key Points: 1. Definition & Basics
20 pages
No SQL
No ratings yet
No SQL
12 pages
NOSQL Lecture 1 Notes
No ratings yet
NOSQL Lecture 1 Notes
31 pages
Full Stack-Unit-Iii
No ratings yet
Full Stack-Unit-Iii
56 pages
Introduction To: Nosql
No ratings yet
Introduction To: Nosql
27 pages
Lecture 1
No ratings yet
Lecture 1
31 pages
Bcse302l Dbms Module-7 Nosql
No ratings yet
Bcse302l Dbms Module-7 Nosql
30 pages
NOsql Presentation
No ratings yet
NOsql Presentation
20 pages
NoSQL for Tech Professionals
No ratings yet
NoSQL for Tech Professionals
29 pages
DSA Notes Unit-03
No ratings yet
DSA Notes Unit-03
144 pages
Module 5 - NoSQL Databases
No ratings yet
Module 5 - NoSQL Databases
33 pages
NoSql Module 1 Part2
No ratings yet
NoSql Module 1 Part2
12 pages
Unit No 1
No ratings yet
Unit No 1
34 pages
Lecture 6 - NoSQL
No ratings yet
Lecture 6 - NoSQL
28 pages
Unit 2 Handouts
No ratings yet
Unit 2 Handouts
11 pages
NoSQL Database Comprehensive Report
No ratings yet
NoSQL Database Comprehensive Report
75 pages
Big Data Analytics Unit-2
No ratings yet
Big Data Analytics Unit-2
30 pages
NOSQL
No ratings yet
NOSQL
15 pages
NGT NOV-19 (Sol) (E-Next - In)
No ratings yet
NGT NOV-19 (Sol) (E-Next - In)
33 pages
Big Data Unit-Ii Notes
No ratings yet
Big Data Unit-Ii Notes
7 pages
Unit 2
No ratings yet
Unit 2
65 pages
No SQL
No ratings yet
No SQL
38 pages
NoSQL Databases Overview
No ratings yet
NoSQL Databases Overview
8 pages
Unit III (FSWD)
No ratings yet
Unit III (FSWD)
27 pages
NoSQL Databases
No ratings yet
NoSQL Databases
10 pages
Module 5 - Nosql
No ratings yet
Module 5 - Nosql
45 pages
Module 2
No ratings yet
Module 2
42 pages
NoSQL Module1 PPT
No ratings yet
NoSQL Module1 PPT
64 pages
Module 1
No ratings yet
Module 1
69 pages
Nosql
No ratings yet
Nosql
64 pages
Unit 2
No ratings yet
Unit 2
26 pages
Nosql What Does It Mean
No ratings yet
Nosql What Does It Mean
8 pages
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
No ratings yet
Learning Guide 2.1 - CloudDatabase - NOSQL PDF
44 pages
Unit 3 NoSQL
No ratings yet
Unit 3 NoSQL
98 pages
Unit 2
No ratings yet
Unit 2
25 pages
NOSQL
No ratings yet
NOSQL
55 pages
NoSQL Databases: A Developer's Guide
No ratings yet
NoSQL Databases: A Developer's Guide
36 pages
Chapter 5: No SQL Data Management and Mongodb: Unit-2
No ratings yet
Chapter 5: No SQL Data Management and Mongodb: Unit-2
65 pages
NoSQL for Developers and IT Pros
No ratings yet
NoSQL for Developers and IT Pros
3 pages
Unit 6
No ratings yet
Unit 6
143 pages
Bda Unit12
No ratings yet
Bda Unit12
9 pages
NoSQL D
No ratings yet
NoSQL D
26 pages
Unit 2
No ratings yet
Unit 2
18 pages
NOSQL
No ratings yet
NOSQL
25 pages
Unit III
No ratings yet
Unit III
12 pages
Unit4 2
No ratings yet
Unit4 2
21 pages
Unit 4 Mfcs
No ratings yet
Unit 4 Mfcs
27 pages
Unit 1 (Big Data Analytics)
No ratings yet
Unit 1 (Big Data Analytics)
11 pages
Deep Learning Ascs
No ratings yet
Deep Learning Ascs
10 pages
Deep Learning
No ratings yet
Deep Learning
23 pages
CIT 3203 Database Administration Notes
100% (1)
CIT 3203 Database Administration Notes
66 pages
Establish JDBC Connectivity Using Mysql, Eclipse, SQL Explorer
No ratings yet
Establish JDBC Connectivity Using Mysql, Eclipse, SQL Explorer
19 pages
Information Technology (802) - Class 12 - Lesson 1 - Mysql Commands
100% (1)
Information Technology (802) - Class 12 - Lesson 1 - Mysql Commands
35 pages
ENUM Using PowerDNS
No ratings yet
ENUM Using PowerDNS
2 pages
Yuvaraj S
No ratings yet
Yuvaraj S
2 pages
Database Classifications
No ratings yet
Database Classifications
10 pages
LIS Interview Questions and Answers Seri
No ratings yet
LIS Interview Questions and Answers Seri
6 pages
db2 SQL Procedural Lang 115
No ratings yet
db2 SQL Procedural Lang 115
114 pages
CSV Files: Creation and Manipulation Guide
No ratings yet
CSV Files: Creation and Manipulation Guide
8 pages
JSON-LD for Developers
No ratings yet
JSON-LD for Developers
6 pages
ST03N Workload Monitor
No ratings yet
ST03N Workload Monitor
3 pages
Gate DBMS
No ratings yet
Gate DBMS
164 pages
Spark2x: Big Data Huawei Course
No ratings yet
Spark2x: Big Data Huawei Course
25 pages
Voltage - SecureData - Hadoop - 5.0 - Jul2022Update - Developer 1
No ratings yet
Voltage - SecureData - Hadoop - 5.0 - Jul2022Update - Developer 1
338 pages
DBMS Lab Manual BCS403
No ratings yet
DBMS Lab Manual BCS403
43 pages
Navneet Project
No ratings yet
Navneet Project
9 pages
Solved Unit 4 Q-Bank
No ratings yet
Solved Unit 4 Q-Bank
24 pages
Pig & Hive Questionaire
No ratings yet
Pig & Hive Questionaire
2 pages
Dbms Lab Manual Print
No ratings yet
Dbms Lab Manual Print
73 pages
Experiment - 01
No ratings yet
Experiment - 01
26 pages
DBMS 10
No ratings yet
DBMS 10
16 pages
Sqoop
No ratings yet
Sqoop
4 pages
DB2 Load
No ratings yet
DB2 Load
20 pages
Magic Quadrant For Structured Data Archiving and Application Retirement
No ratings yet
Magic Quadrant For Structured Data Archiving and Application Retirement
12 pages
Spark Commands
No ratings yet
Spark Commands
3 pages
Servicenow Devopement Scripting 1741405410
No ratings yet
Servicenow Devopement Scripting 1741405410
28 pages
University Database Search Engine
No ratings yet
University Database Search Engine
9 pages
11 and .NET 7 Part7
No ratings yet
11 and .NET 7 Part7
8 pages
DBMS Syllabus
No ratings yet
DBMS Syllabus
2 pages
MYSQL
No ratings yet
MYSQL
8 pages

Unit 2 (Big Data Analytics)

Uploaded by

Unit 2 (Big Data Analytics)

Uploaded by

Unit 2(Big Data Analytics) (Search with  for new topic)

aggregate data models

Function Format Comment Example

Min MIN:field Returns object h={"$aggregate":["MIN:score"]}

Max MAX:field Returns object h={"$aggregate":["MAX:score"]}

Avg AVG:field Returns value h={"$aggregate":["AVG:score"]}

Sum SUM:field Returns value h={"$aggregate":["SUM:score"]}

Returns value with

Groupby $groupby: ["field", Returns h={"$groupby":["category"]}

Predefined values for:

Groupby Format strings for: ss,

Grand $aggregate-grand- Recursive aggregation h={"$groupby":["category"], "$aggregate":

How do key-value databases work?

document data models

A graph database is defined as a specialized, single

schema less databases

1. CREATE (TABLE | COLUMNFAMILY) <tablename>

For declaring a primary key:

1. CREATE TABLE tablename(

1. Create table TableName

There are two types of primary keys:

Let's take an example to demonstrate the CREATE TABLE command.

Here, we are using already created Keyspace "javatpoint".

1. CREATE TABLE student(

Cassandra - Read Data

emp_id emp_name emp_city emp_phone emp_sal

1 ram Hyderabad 9848022338 50000

2 robin null 9848022339 50000

4 rajeev Pune 9848022331 30000

emp_id | emp_city | emp_name | emp_phone | emp_sal

Reading Required Columns

emp_id | emp_city | emp_name | emp_phone | emp_sal

Inserting data using a CSV file in Cassandra

e1ae4cf0-d358-4d55-b511-85902fda9cc1 Ashish christopher

e4ae4cf0-d358-4d55-b511-85902fda9cc4 Christine christopher

Starting copy of data.Student_personal_data with columns [S_id, S_firstname, S_lastname].

e1ae4cf0-d358-4d55-b511-85902fda9cc1 Ashish christopher

e4ae4cf0-d358-4d55-b511-85902fda9cc4 Christine christopher

You might also like