L7 Data Processing and Analytics
Lecture 6: Document Databases and
MongoDB
Dr Ismail Alarab
[email protected]
www.bournemouth.ac.uk
Recap
- Graph Database
- Graph vs Relational
- Neo4J
- Cypher Query
Unit Attendance
Scan Me!
Lecture Outline
➢Key – Value
➢Document Databases
➢MongoDB
Key-Value
Key - Value
● A key-value pair where:
○ Key: usually a string [mapped into a number]
○ Value: data or object that Key is associated with
Value
Key
{ name: "John", Value
Key age: 35
}
Key - Value
● The characteristic feature of a key-value store is that it is “simple but
quick“
● Data is stored in a simple key-value structure and the key-value store
is ignorant of the content of the value part
● Notations
○ (ordered pair)
○ {Key: Value} (JSON notation , Python notation)
○ Key → Value
Document Database
Document Database
▪ Document-Based data model
▪ Document DBs store data in a semi-structured and nested text
format like XML documents or JSON documents
▪ A schema-less model lets you represent data with variable
▪ properties.
▪ Each document stored in a collection
▪ Collections
o Have index set in common
o Like tables of relational db’s.
o Documents do not have to have uniform structure
JSON
Example
▪ JSON: “JavaScript Object Notation” {
"name": "John",
▪ Easy for humans to write/read, easy for "age": 30,
computers to parse/generate "city": "New York"
}
▪ Objects can be nested
▪ Built on
o name/value pairs
o Ordered list of values
Example: JSON Schema and JSON
JSON Schema JSON
{ {
"$schema": "http://json-schema.org/draft-07/schema#", "name": "Alice",
"type": "object", "age": 25,
"properties": { "email": "[email protected]"
"name": { "type": "string" }, }
"age": { "type": "integer"},
"email": { "type": "string"}
},
"required": ["name", "age", "email"]
}
• Key-value pair are separated by comma
• The property keys should be followed consistently for any collection to ensure data
integrity and consistency
Example On Data Consistency
Assuming the JSON Schema has this properties (from the previous slide):
"properties": {
"name": { "type": "string" },
"age": { "type": "integer"},
"email": { "type": "string"}
},
Consistent Data Inconsistent Data
{ {
"name": "Alice", “Firstname": "Alice",
"age": 25, “age": 25,
"email": "
[email protected]" “EMAIL": "
[email protected]"
} }
Example
Document Example: {
"_id": “1",
"name": "John",
"age": 30,
}
Collection Example: [
{
"_id": “1”,
"name": "John ",
"age": 30,
},
{
"_id": “2",
"name": “Amanda",
"age": 25,
}
]
MongoDB
MongoDB
• MongoDB is a document-based database.
• BSON document data store
• Database and collections are created automatically
• Shifting from relational data model to a new data model is to
replace the concept of a ‘row’ with a flexible model, the ‘document’.
• MongoDB is schema-free, the keys used in documents are not
predefined or fixed.
• While migrating data to MongoDB, any issues of new or missing
keys can be resolved at the application level, rather than changing
the schema in the database.
_id Field
• By default, each document contains an _id field. This field has a
number of special characteristics:
– Value serves as primary key for collection.
– Value is unique, immutable, and may be any non-array type.
– Default data type is ObjectId, which is “small, likely unique,
fast to generate and order.”
– Sorting on an ObjectId value is roughly equivalent to sorting on creation
time.
_id Field Example
Input: db.products.save(
{ _id: 100, item: "water", qty: 30 } )
Specifying “_id”:
Output { "_id" : 100,
"item" : "water",
"qty" : 30
}
Input: db.products.save(
Without Specifying “_id”: { item: "book", qty: 40 } )
Output { "_id" :
ObjectId("50691737d386d8fadbd6b01d
"),
"item" : "book",
"qty" : 40
}
MongoDB vs SQL
MongoDB SQL
Store Unstructured Data (NoSQL) Store Structured Data
Document Tuple/Record
Collection Table/View
PK: _id Field PK: Any Attribute(s)
Uniformity not required Uniform Relation Schema
Index Index
Embedded Structure Joins
Shard Partition
Note: Sharding in MongoDB is horizontal split to store data in smaller subsets across multiple servers
Mongo Is Schema-Free
• The purpose of schema in SQL is for meeting the requirements
of tables and quirky implementation
• Every “row” in a database “table” is a data structure, much like
a “struct” in C, or a “class” in Java. A table is then an array (or
list) of such data structures
• Our design in MongoDB is basically same way how we design a
compound data type binding in JSON
Embedding & Linking
One to One relationship (Example 1)
zip = { zip = {
_id: 35004, {
city: “ACMAR”, _id: 35004,
loc: [-86, 33], city: “ACMAR”,
pop: 6065, loc: [-86, 33],
State: “AL” pop: 6065,
} State: “AL”
},
council_person = {
zip_id = 35004, council_person : {
name: “John Doe", zip_id = 35004,
address: “123 Fake St.”, name: “John Doe",
Phone: 123456 address: “123 Fake St.”,
} Phone: 123456
}
}
Example 2
MongoDB: The Definitive Guide, By Kristina Chodorow and Mike Dirolf
Published: 9/24/2010
Pages: 216
Language: English
Publisher: O’Reilly Media, CA
One to Many Relationship – Embedding
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher: {
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
}
One to Many Relationship - Linking
publisher = {
_id: "oreilly",
name: "O’Reilly Media",
founded: "1980",
location: "CA"
}
book = {
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ]
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English",
publisher_id: "oreilly"
}
Linking vs Embedding
• Embedding is a bit like pre-joining data
• Document level operations are easy for the server to handle
• Embed when the “many” objects always appear with
(viewed in the context of) their parents.
• Linking when you need more flexibility
Many to Many Relationship
• Can put relation in either one of the
documents (embedding in one of the
documents)
• Focus how data is accessed and queried
Example
book = { author = {
title: "MongoDB: The Definitive Guide", _id: "kchodorow",
authors : [ name: "Kristina Chodorow“
{ _id: "kchodorow", name: "Kristina Chodorow” }, }
{ _id: "mdirolf", name: "Mike Dirolf” }
],
published_date: ISODate("2010-09-24"),
pages: 216,
language: "English"
}
Syntax to find all books of an author name:
db.books.find( { authors.name : "Kristina Chodorow" } )
Example 3
• Book can be checked out by one student at a time
• Student can check out many books
Modelling Checkouts
student = {
_id: "joe"
name: "Joe Bookreader",
join_date: ISODate("2011-10-15"),
address: { ... }
}
book = {
_id: "123456789"
title: "MongoDB: The Definitive Guide",
authors: [ "Kristina Chodorow", "Mike Dirolf" ],
...
}
Modelling Checkouts
student = {
_id: "joe"
name: "Joe Bookreader",
join_date: ISODate("2011-10-15"),
address: { ... },
checked_out: [
{ _id: "123456789", checked_out: "2012-10-15" },
{ _id: "987654321", checked_out: "2012-09-12" },
...
]
}
CRUD: Create Read Update Delete
CRUD
To insert documents into a collection/make a new
collection:
CRUD: Inserting Data
Insert one document
Inserting a document with a field name new to the collection is
inherently supported by the BSON model.
Also, inserting one vs many documents:
db.<collection>.insertOne(<document>)
db.<collection>.insertMany([<document1>, <document2>, …])
Practical Example With MongoDB
Reader ReaderID Name
2468 Nasr
2512 Aboubakr
Book BookID Title
1002 Introduction to DBS BookLending BookID ReaderID ReturnDate
1004 Patterns of enterprise 1002 2468 25-10-2016
application architecture 1006 2468 27-10-2016
1006 Con Quixote 1004 2512 31-10-2016
Example on Create/Insert: MongoDB Inserting Data
db.BookLending.insert({ Book: "Introduction to DBS", Reader: "Nasr", ReturnDate: "25-10-
2016" })
db.BookLending.insert({ Book: "Don Quixote", Reader: "Nasr", ReturnDate: "27-10-2016" })
db.BookLending.insert({ Book: "Patterns of enterprise application architecture", Reader:
"Aboubakr", ReturnDate: "31-10-2016" })
Another way of inserting the same data using lists:
db.BookLending.insertMany([
{Book:"Introduction to DBS", Reader: "Nasr", ReturnDate:"25-10-2016"},
{Book:"Don Quixote", Reader: "Nasr”, ReturnDate:”27-10-2016”},
{Book:"Patterns of enterprise application architecture", Reader: "Aboubakr", ReturnDate:"31-10-
2016"}
])
CRUD: Querying
CRUD: Querying With “AND” Condition
. Example: (AND)
db.BookLending.find(
AND Clause in MongoDB
{Book: "Introduction to DBS",
Reader: "Nasr"
})
CRUD: Querying With “OR” Condition
Example: (OR)
. OR Clause in MongoDB db.BookLending.find({
$or: [
{ Book: "Introduction to DBS"
},
{ Reader: "Nasr" }
]
})
Example: (Multiple fields)
db.BookLending.find({ReturnDate: { $in: ["25-10-2016", "27-10-2016"] }})
CRUD: Querying With Include/Exclude Results
Example: (Exclude)
. db.BookLending.find(
{ Reader: "Nasr" }, // Match
{ Reader: 0 } // 0 to Exclude
)
Example: (Include/Exclude)
db.BookLending.find(
{ Reader: "Nasr" },
{ Book: 1, ReturnDate: 1, _id: 0
}
)
Example: More about Querying
.
Example: Querying
Example 1: (returning all books)
db.BookLending.find()
Example 2: (finding a book lending by the reader name)
db.BookLending.find({ Reader: "Nasr" })
Example 3: (finding book lending before a date
db.BookLending.find({ ReturnDate: { $lt: "27-10-2016" } })
Example 4: (find book lending by title with a specific word)
db.BookLending.find({ Book: /DBS/ })
CRUD: Updating
.
CRUD: Updating
.
Example: Updating ($set and $unset)
db.BookLending.updateOne(
{ Book: "Introduction to DBS", Reader: "Nasr" }, //Match first
{ $set: { ReturnDate: "30-10-2016" } } // Update the first one matching the criteria
)
db.BookLending.updateMany(
{ Reader: "Nasr" }, // Match first
{ $set: { ReturnDate: "30-10-2016" } } // Update all documents who match the criteria
)
db.BookLending.updateOne(
{ Book: "Introduction to DBS", Reader: "Nasr" },
{ $unset: { ReturnDate: "" } } // Remove a property)
CRUD: Removal
Also, there is deleteOne and deleteMany
Example: Removal
db.BookLending.deleteMany({ Reader: "Nasr" }) //Delete all who match the criteria
db.BookLending.deleteOne({ Reader: "Nasr" }) //Delete first document who match the criteria
db.BookLending.remove({ Reader: "Nasr" }) //Delete all who match the criteria
Using the Shell
To check which db you’re using db
Show all databases show dbs
Switch db’s/make a new one use <name>
See what collections exist show collections
MongoDB Aggregation Framework
● For usual aggregation operations (sum, avg, etc.)
● A pipeline of operations
● It is possible to:
○ Group by categories
○ Impose conditions
○ Select fields for output
● Somewhat different syntax than usual CRUD queries
Aggregation Matching
● $match operator.
db.orders.aggregate([{ $match: { status: "A" }}])
● Selecting all documents where status attribute is “A”
● No aggregation yet
● Note that this is equivalent to using find operator
Aggregation Matching (Example)
db.orders.insertMany([
{ _id: 1, status: "A", amount: 50 },
{ _id: 2, status: "B", amount: 75 },
{ _id: 3, status: "A", amount: 100 },
{ _id: 4, status: "A", amount: 150 },
{ _id: 5, status: "B", amount: 200 }
])
Input code: Output:
db.orders.aggregate([ [
{ { _id: 1, status: 'A', amount: 50 },
$match: { status: "A" } { _id: 3, status: 'A', amount: 100 },
} { _id: 4, status: 'A', amount: 150 }
]) ]
Aggregation Grouping With Sum
● $group operator:
db.orders.aggregate([
{
$group: {
_id: "$status", // Group by the status field
total: { $sum: "$amount" } // Calculate the total amount for each
status group
}
}
])
● Calculating sum of amount attribute per status, showing it
as “total”.
● Specifying the attribute to group in _id. Possible to have
combinations of grouping fields.
Aggregation Grouping With Sum (Example)
db.orders.insertMany([
{ _id: 1, status: "A", amount: 50 },
{ _id: 2, status: "B", amount: 75 },
{ _id: 3, status: "A", amount: 100 },
{ _id: 4, status: "A", amount: 150 },
{ _id: 5, status: "B", amount: 200 }
])
Input code: Output:
db.orders.aggregate([{ [
$group: { { _id: 'A', total: 300 },
_id: "$status", { _id: 'B', total: 275 }
total: ]
{ $sum: "$amount" } }}])
Aggregation Grouping With Max
● Different aggregation operators, such as $max, $min, $avg, etc.
db.orders.aggregate([
{
$group: {
_id: “$status”, // Group all documents into one group using null
maximum: { $max: "$amount" }
}
}
])
● Finding the maximum among amount attribute per status, showing it as
“maximum”.
Aggregation Grouping With Max (Example)
db.orders.insertMany([
{ _id: 1, status: "A", amount: 50 },
{ _id: 2, status: "B", amount: 75 },
{ _id: 3, status: "A", amount: 100 },
{ _id: 4, status: "A", amount: 150 },
{ _id: 5, status: "B", amount: 200 }
])
Input code: Output:
db.orders.aggregate([{ [
$group: { _id: “$status”, { _id: 'A', maximum: 150 },
maximum:{ $max: "$amount" { _id: 'B', maximum: 200 }
} ]
}}])
Aggregation Grouping
● What about no grouping?
● E.g. finding the maximum amount in general
● Use _id:1
Input code: Output:
db.orders.aggregate([ [
{ $group: { _id: 1, maximum: 200 }
{ _id:1, ]
maximum: { $max: "$amount" }
}
}
])
Aggregation Grouping With Counts
● Counts(No of Occurrence) can be calculated by specifying
$sum:1
Find out how many times each status appears?
db.orders.aggregate([
{ $group: { _id: "$status", total: {
$sum:1} } }
])
Aggregation Grouping With Counts (Example)
db.orders.insertMany([
{ _id: 1, status: "A", amount: 50 },
{ _id: 2, status: "B", amount: 75 },
{ _id: 3, status: "A", amount: 100 },
{ _id: 4, status: "A", amount: 150 },
{ _id: 5, status: "B", amount: 200 }
])
Input code: Output:
db.orders.aggregate([{ [
$group: { _id: 'B', total: 2 },
{ _id: "$status", { _id: 'A', total: 3 }
total: {$sum:1} } } ]
])
Aggregation Projection
● Selecting fields to view in the result
● Similar to find functionality.
Assuming the following documents:
db.students.insertMany([
{ "_id": 1, "name": "Alice", "age": 20, "gender": "female", "grade": "A" },
{ "_id": 2, "name": "Bob", "age": 22, "gender": "male", "grade": “A" },
{ "_id": 3, "name": "Charlie", "age": 21, "gender": "male", "grade": “B" }
])
Input code: Output:
db.students.aggregate([ [
{ { name: 'Alice', age: 20 },
$project: { { name: 'Bob', age: 22 },
name: 1, { name: 'Charlie', age: 21 }
age: 1, ]
_id: 0} }])
Aggregation Pipeline: By Example
Assume we have some sample data as:
db.orders.insertMany([
{ "_id": 1, "product": "Phone", "type": "iOS", "quantity": 2, "price": 500 },
{ "_id": 2, "product": "Phone", "type": "iOS", "quantity": 3, "price": 600 },
{ "_id": 3, "product": "Laptop", "type": "Android", "quantity": 1, "price": 1200 },
{ "_id": 4, "product": "Laptop", "type": "Android", "quantity": 2, "price": 1000 },
{ "_id": 5, "product": "Tablet", "type": "iOS", "quantity": 3, "price": 300 }
]);
Exercise:
We are required to write an aggregation pipeline to match the product with
more than 1 quantity, group the orders by type (IOS/Android) and find the
totalRevenue of each group, project the fields “type” and “totalRevenue”
Aggregation Pipeline: By Example
Match (filter) orders with more than 1 quantity
Step 1:
db.orders.aggregate([
{
$match: { quantity: { $gt: 1 }
},
// To be continued
Aggregation Pipeline: By Example
Group by type with totalRevenue
Step 2:
{
$group: {
_id: "$type", // Group by type (IOS/Android)
totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }
}
},
Aggregation Pipeline: By Example
Project the relevant fields
Step 3:
{
$project: {
type: "$_id", // Rename _id to type
totalRevenue: 1, // Include the totalRevenue field
_id : 0 // Exclude _id
}
Aggregation Pipeline: By Example
{ "_id": 1, "product": "Phone", "type": "iOS", "quantity": 2, "price": 500 },
{ "_id": 2, "product": "Phone", "type": "iOS", "quantity": 3, "price": 600 },
{ "_id": 3, "product": "Laptop", "type": "Android", "quantity": 1, "price": 1200 },
{ "_id": 4, "product": "Laptop", "type": "Android", "quantity": 2, "price": 1000 },
{ "_id": 5, "product": "Tablet", "type": "iOS", "quantity": 3, "price": 300 }
Input Code: Output
db.orders.aggregate([ { totalRevenue: 3700, type: 'iOS' },
{ $match: { quantity: { $gt: 1 } } { totalRevenue: 2000, type: 'Android' } ]
},
{ $group: { _id: "$type",
totalRevenue: { $sum: { $multiply: ["$quantity", "$price"] } }}
},
{ $project: {
type: "$_id",
totalRevenue: 1 ,
_id: 0}
}]);
Formatting
• You can use .pretty() in the end of your query for
better formatted results.
Example:
db.order.find({}).pretty()
This query will retrieve all documents from the orders collection
and display them in a more structured output.
JOINS
• Joins only when absolutely necessary
• Possible in aggregation framework using $lookup operator
• BUT! Keep in mind that you should use fully self-contained
documents as much as possible
• Best practices to reduce $lookup:
https://www.mongodb.com/docs/atlas/schema-
suggestions/reduce-lookup-operations/
Best Practices in Document DBs
Some tips for improving schema:
• Avoid Unbounded Arrays
• Remove unnecessary Indexes
• Reduce the size of large documents
• Reduce Collections
https://www.mongodb.com/docs/atlas/performance-
advisor/schema-suggestions/
Summary
● Key-value structure is humanly intuitive and there are currently
many good implementations for it.
● JSON one of the more popular formats for key-value stores
● Document database, BSON, _id field
● MongoDB: Document based data model, example queries
● MongoDB: more feature packed; for a larger class of applications
○ Simple querying with find operator
○ More complex operations with the aggregation pipeline
Reading
● Wiese, Chapter 6
● Mongodb manual https://docs.mongodb.com/manual
Homework:
● Configure MongoDB on your machine
○ Either the cloud version (follow the tutorial on Brightspace)
○ Or locally
(https://docs.mongodb.com/manual/installation/#mongodb-communityedition-
installation-tutorials)
QUESTIONS?
69