Alvin Richards
[email protected]
Topics
Overview
Data modeling
Replication & Sharding
Developing with Java
Deployment
Drinking from the fire hose
Part One
MongoDB Overview
Strong adoption of MongoDB
90,000
Database
downloads
per month
Over 1,000 Production
Deployments
web 2.0 companies started out using this
but now:
- enterprises
- financial industries
3 Reason
- Performance
- Large number of readers / writers
- Large data volume
- Agility (ease of development)
NoSQL Really
Means:
non-‐relational,
next-‐generation
operational
datastores
and
databases
RDBMS
(Oracle,
MySQL)
past : one-size-fits-all
RDBMS
(Oracle,
MySQL)
New Gen.
OLAP
(vertica,
aster,
greenplum)
present : business intelligence and analytics is now its own segment.
RDBMS
(Oracle,
MySQL)
New Gen. Non-relational
OLAP Operational
(vertica,
aster,
Stores
greenplum) (“NoSQL”)
future
we claim nosql segment will be:
* large
* not fragmented
* ‘platformitize-able’
Philosophy:
maximize
features
-‐
up
to
the
“knee”
in
the
curve,
then
stop
• memcached
scalability
&
performance
• key/value
• RDBMS
depth
of
functionality
no
joins
+ no
complex
transactions
Horizontally Scalable
Architectures
no
joins
+ no
complex
transactions
New Data Models
Improved ways to develop
Platform and Language support
MongoDB is Implemented in C++ for best performance
Platforms 32/64 bit
• Windows
• Linux, Mac OS-X, FreeBSD, Solaris
ease of development a surprisingly big benefit : faster to code, faster to change, avoid upgrades and scheduled downtime
more predictable performance
fast single server performance -> developer spends less time manually coding around the database
bottom line: usually, developers like it much better after trying
Platform and Language support
MongoDB is Implemented in C++ for best performance
Platforms 32/64 bit
• Windows
• Linux, Mac OS-X, FreeBSD, Solaris
Language drivers for
• Java
• Ruby / Ruby-on-Rails
• C#
• C / C++
• Erlang
• Python, Perl, JavaScript
• Scala
• others...
ease of development a surprisingly big benefit : faster to code, faster to change, avoid upgrades and scheduled downtime
more predictable performance
fast single server performance -> developer spends less time manually coding around the database
bottom line: usually, developers like it much better after trying
Part Two
Data Modeling in MongoDB
So why model data?
A brief history of normalization
• 1970 E.F.Codd introduces 1st Normal Form (1NF)
• 1971 E.F.Codd introduces 2nd and 3rd Normal Form (2NF, 3NF)
• 1974 Codd & Boyce define Boyce/Codd Normal Form (BCNF)
• 2002 Date, Darween, Lorentzos define 6th Normal Form (6NF)
Goals:
• Avoid anomalies when inserting, updating or deleting
• Minimize redesign when extending the schema
• Make the model informative to users
• Avoid bias towards a particular style of query
* source : wikipedia
The real benefit of relational
• Before relational
• Data and Logic combined
• After relational
• Separation of concerns
• Data modeled independent of logic
• Logic freed from concerns of data design
• MongoDB continues this separation
Relational made normalized
data look like this
Document databases make
normalized data look like this
Terminology
RDBMS MongoDB
Table Collection
Row(s) JSON
Document
Index Index
Join Embedding
&
Linking
Partition Shard
Partition
Key Shard
Key
DB Considerations
How can we manipulate Access Patterns ?
this data ?
• Read / Write Ratio
• Dynamic Queries • Types of updates
• Secondary Indexes • Types of queries
• Atomic Updates • Data life-cycle
• Map Reduce
Considerations
• No Joins
• Document writes are atomic
So today’s example will use...
Design Session
Design documents that simply map to
your application
post
=
{author:
“Hergé”,
date:
new
Date(),
text:
“Destination
Moon”,
tags:
[“comic”,
“adventure”]}
>db.post.save(post)
Find the document
>db.posts.find()
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
date : "Sat Jul 24 2010 19:47:11 GMT-0700 (PDT)",
text : "Destination Moon",
tags : [ "comic", "adventure" ] }
Notes:
• ID must be unique, but can be anything you’d like
• MongoDB will generate a default ID if one is not
supplied
Add and index, find via Index
Secondary index for “author”
// 1 means ascending, -1 means descending
>db.posts.ensureIndex({author: 1})
>db.posts.find({author: 'Hergé'})
{ _id : ObjectId("4c4ba5c0672c685e5e8aabf3"),
author : "Hergé",
... }
Verifying indexes exist
>db.system.indexes.find()
// Index on ID
{ name : "_id_",
ns : "test.posts",
key : { "_id" : 1 } }
// Index on author
{ _id : ObjectId("4c4ba6c5672c685e5e8aabf4"),
ns : "test.posts",
key : { "author" : 1 },
name : "author_1" }
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
// find posts with any tags
>db.posts.find({tags: {$exists: true}})
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
// find posts with any tags
>db.posts.find({tags: {$exists: true}})
Regular expressions:
// posts where author starts with h
>db.posts.find({author: /^h*/i })
Query operators
Conditional operators:
$ne, $in, $nin, $mod, $all, $size, $exists, $type, ..
$lt, $lte, $gt, $gte, $ne,
// find posts with any tags
>db.posts.find({tags: {$exists: true}})
Regular expressions:
// posts where author starts with h
>db.posts.find({author: /^h*/i })
Counting:
// posts written by Hergé
>db.posts.find({author:
“Hergé”}).count()
Extending the Schema
new_comment
=
{author:
“Kyle”,
date:
new
Date(),
text:
“great
book”}
>db.posts.update({_id:
“...”
},
{
‘$push’:
{comments:
new_comment},
‘$inc’:
{comments_count:
1}})
Extending the Schema
{
_id
:
ObjectId("4c4ba5c0672c685e5e8aabf3"),
author
:
"Hergé",
date
:
"Sat
Jul
24
2010
19:47:11
GMT-‐0700
(PDT)",
text
:
"Destination
Moon",
tags
:
[
"comic",
"adventure"
],
comments_count:
1,
comments
:
[
{
author
:
"Kyle",
date
:
"Sat
Jul
24
2010
20:51:03
GMT-‐0700
(PDT)",
text
:
"great
book"
}
]}
Extending the Schema
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”Kyle”})
Extending the Schema
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”Kyle”})
// find last 5 posts:
>db.posts.find().sort({date:-1}).limit(5)
Extending the Schema
// create index on nested documents:
>db.posts.ensureIndex({"comments.author": 1})
>db.posts.find({comments.author:”Kyle”})
// find last 5 posts:
>db.posts.find().sort({date:-1}).limit(5)
// most commented post:
>db.posts.find().sort({comments_count:-1}).limit(1)
When sorting, check if you need an index
Explain a query plan
>
db.blogs.find({author:
'Hergé'}).explain()
{
"cursor"
:
"BtreeCursor
author_1",
"nscanned"
:
1,
"nscannedObjects"
:
1,
"n"
:
1,
"millis"
:
5,
"indexBounds"
:
{
"author"
:
[
[
"Hergé",
"Hergé"
]
]
}
Watch for full table scans
>
db.blogs.find({text:
'Destination
Moon'}).explain()
{
"cursor"
:
"BasicCursor",
"nscanned"
:
1,
"nscannedObjects"
:
1,
"n"
:
1,
"millis"
:
0,
"indexBounds"
:
{
}
}
Map Reduce
Map reduce : count tags
mapFunc = function () {
this.tags.forEach(function (z) {emit(z, {count:1});});
}
reduceFunc = function (k, v) {
var total = 0;
for (var i = 0; i < v.length; i++) { total += v[i].count; }
return {count:total};
}
res = db.posts.mapReduce(mapFunc, reduceFunc)
>db[res.result].find()
{ _id : "comic", value : { count : 1 } }
{ _id : "adventure", value : { count : 1 } }
Group
• Equivalent to a Group By in SQL
• Specific the attributes to group the data
• Process the results in a Reduce function
Group
cmd
=
{
key:
{
"author":true
},
initial:
{count:
0},
reduce:
function(obj,
prev)
{
prev.count++;
},
};
result
=
db.posts.group(cmd);
[
{
"author"
:
"Hergé",
"count"
:
1
},
{
"author"
:
"Kyle",
"count"
:
3
}
]
Review
So Far:
- Started out with a simple schema
- Queried Data
- Evolved the schema
- Queried / Updated the data some more
Single Table Inheritance
>db.shapes.find()
{ _id: ObjectId("..."), type: "circle", area: 3.14, radius: 1}
{ _id: ObjectId("..."), type: "square", area: 4, d: 2}
{ _id: ObjectId("..."), type: "rect", area: 10, length: 5, width: 2}
// find shapes where radius > 0
>db.shapes.find({radius: {$gt: 0}})
// create index
>db.shapes.ensureIndex({radius: 1})
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
One to Many
- Embedded Array / Array Keys
- slice operator to return subset of array
- some queries hard
e.g find latest comments across all documents
- Embedded tree
- Single document
- Natural
- Hard to query
- Normalized (2 collections)
- most flexible
- more queries
Many - Many
Example:
- Product can be in many categories
- Category can have many products
Products Product_Categories
- product_id - product_id
- category_id
Category
- category_id
Many - Many
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
Many - Many
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"Adventure",
product_ids:
[
ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
Many - Many
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"Adventure",
product_ids:
[
ObjectId("4c4ca23933fb5941681b912e"),
ObjectId("4c4ca30433fb5941681b9130"),
ObjectId("4c4ca30433fb5941681b913a"]}
//All
categories
for
a
given
product
>db.categories.find({product_ids:
ObjectId
("4c4ca23933fb5941681b912e")})
Alternative
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"Adventure"}
Alternative
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"Adventure"}
//
All
products
for
a
given
category
>db.products.find({category_ids:
ObjectId
("4c4ca25433fb5941681b912f")})
Alternative
products:
{
_id:
ObjectId("4c4ca23933fb5941681b912e"),
name:
"Destination
Moon",
category_ids:
[
ObjectId("4c4ca25433fb5941681b912f"),
ObjectId("4c4ca25433fb5941681b92af”]}
categories:
{
_id:
ObjectId("4c4ca25433fb5941681b912f"),
name:
"Adventure"}
//
All
products
for
a
given
category
>db.products.find({category_ids:
ObjectId
("4c4ca25433fb5941681b912f")})
//
All
categories
for
a
given
product
product
=
db.products.find(_id
:
some_id)
>db.categories.find({_id
:
{$in
:
product.category_ids}})
Trees
Full Tree in Document
{
comments:
[
{
author:
“Kyle”,
text:
“...”,
replies:
[
{author:
“Fred”,
text:
“...”,
replies:
[]}
]}
]
}
Pros: Single Document, Performance, Intuitive
Cons: Hard to search, Partial Results, 4MB limit
Trees
Parent Links
- Each node is stored as a document
- Contains the id of the parent
Child Links
- Each node contains the id’s of the children
- Can support graphs (multiple parents / child)
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
//find all descendants of b:
>db.tree2.find({ancestors: ‘b’})
Array of Ancestors
- Store Ancestors of a node
{ _id: "a" }
{ _id: "b", ancestors: [ "a" ], parent: "a" }
{ _id: "c", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "d", ancestors: [ "a", "b" ], parent: "b" }
{ _id: "e", ancestors: [ "a" ], parent: "a" }
{ _id: "f", ancestors: [ "a", "e" ], parent: "e" }
{ _id: "g", ancestors: [ "a", "b", "d" ], parent: "d" }
//find all descendants of b:
>db.tree2.find({ancestors: ‘b’})
//find all ancestors of f:
>ancestors = db.tree2.findOne({_id:’f’}).ancestors
>db.tree2.find({_id: { $in : ancestors})
findAndModify
Queue example
//Example: find highest priority job and mark
job = db.jobs.findAndModify({
query: {inprogress: false},
sort: {priority: -1),
update: {$set: {inprogress: true,
started: new Date()}},
new: true})
Part Three
Replication & Sharding
Scaling
• Data size only goes up
• Operations/sec only go up
• Vertical scaling is limited
• Hard to scale vertically in the cloud
• Can scale wider than higher
What is scaling?
Well - hopefully for everyone here.
Traditional Horizontal Scaling
• read only slaves
• caching
• custom partitioning code
scaling isn’t new
sharding isn’t
manual re-balancing is painful at best
New methods of Scaling
• relational database clustering
• consistent hashing (Dynamo)
• range based partitioning (BigTable/PNUTS)
Read Scalability : Replication
read
ReplicaSet
1
Primary
Secondary
Secondary
write
Basics
• MongoDB replication is a bit like MySQL replication
Asynchronous master/slave at its core
• Variations:
Master / slave
Replica Pairs (deprecated – use replica sets)
Replica Sets
Replica Sets
• A cluster of N servers
• Any (one) node can be primary
• Consensus election of primary
• Automatic failover
• Automatic recovery
• All writes to primary
• Reads can be to primary (default) or a secondary
Replica Sets – Design Concepts
1. Write is durable once avilable on a majority of
members
2. Writes may be visible before a cluster wide
commit has been completed
3. On a failover, if data has not been replicated
from the primary, the data is dropped (see #1).
Replica Set: Establishing
Member 1
Member 3
Member 2
Replica Set: Electing primary
Member 1
Member 3
Member 2
PRIMARY
Replica Set: Failure of master
negotiate
Member 1 new
Member 3
master PRIMARY
Member 2
DOWN
Replica Set: Reconfiguring
Member 1
Member 3
PRIMARY
Member 2
DOWN
Replica Set: Member recovers
Member 1
Member 3
PRIMARY
Member 2
RECOVER-
ING
Replica Set: Active
Member 1
Member 3
PRIMARY
Member 2
Set Member Types
Normal (priority == 1)
Passive (priority == 0)
Arbiter (no data, but can vote)
Write Scalability: Sharding
read key
range
key
range
key
range
0
..
30 31
..
60 61
..
100
ReplicaSet
1 ReplicaSet
2 ReplicaSet
3
Primary Primary Primary
Secondary Secondary Secondary
Secondary Secondary Secondary
write
Sharding
• Scale horizontally for data size, index size, write and
consistent read scaling
• Distribute databases, collections or a objects in a
collection
• Auto-balancing, migrations, management happen
with no down time
• Replica Sets for inconsistent read scaling
for inconsistent read scaling
Sharding
• Choose how you partition data
• Can convert from single master to sharded system
with no downtime
• Same features as non-sharding single master
• Fully consistent
Range Based
• collection is broken into chunks by range
• chunks default to 200mb or 100,000 objects
Architecture
Shards
mongod mongod mongod ...
Config mongod mongod mongod
Servers
mongod
mongod
mongod mongos mongos ...
client
Config Servers
• Hold meta data of where chunks are located
•1 or 3 of them (3 for availability)
• changes are made with 2 phase commit
• if a majority are down, meta data goes read only
• system is online as long as 1/3 is up
Shards
• Hold the actual data
•Can be master, master/slave or replica sets
• Replica sets gives sharding + full auto-failover
• Regular mongod processes
mongos
• Sharding Router (or Switch)
• Acts just like a mongod to clients
• Can have 1 or as many as you want
• Can run on appserver so no extra network traffic
Writes
• Inserts : require shard key, routed
• Removes: routed and/or scattered
• Updates: routed or scattered
Queries
• By shard key: routed
• Sorted by shard key: routed in order
• By non shard key: scatter gather
• Sorted by non shard key: distributed merge sort
Operations
• split: breaking a chunk into 2
• migrate: move a chunk from 1 shard to another
• balancing: moving chunks automatically to
keep system in balance
Part Four
Java Development
Library Choices
• Raw MongoDB Driver
Map<String, Object> view of objects
Rough but dynamic
• Morphia (type-safe mapper)
POJOs
Annotation based (similar to JPA)
Syntactic sugar and helpers
• Others
Code generators, other jvm languages
MongoDB Java Driver
• BSON Package
Types
Encode/Decode
DBObject (Map<String, Object>)
Nested Maps
Directly encoded to binary format (BSON)
• MongoDB Package
Mongo
DBObject (BasicDBObject/Builder)
DB/DBColletion
DBQuery/DBCursor
BSON Package
Types
int and long
Array/ArrayList
String
byte[] – binData
Double (IEEE 754 FP)
Date (secs since epoch)
Null
Boolean
JavaScript String
Regex
MongoDB Package
• Mongo
Connection, ThreadSafe
WriteConcern*
• DB
Auth, Collections
getLastError()
Command(), eval()
RequestStart/Done
• DBCollection
Insert/Save/Find/Remove/Update/
FindAndModify
ensureIndex
Simple Example
DBCollection
coll
=
new
Mongo().getDB(“blogdb”);
ArrayList<String>
tags
=
new
ArrayList<String>();
tags.add("comic");
tags.add("adventure");
coll.save(
new
BasicDBObjectBuilder(
“author”,
“Hergé”).
append(“text”,
“Destination
Moon”).
append(“date”,
new
Date()).
append(“tags”,
tags);
Simple Example, Again
DBCollection
coll
=
new
Mongo().getDB(“blogdb”);
ArrayList<String>
tags
=
new
ArrayList<String>();
tags.add("comic");
tags.add("adventure");
Map<String,
Object>
fields
=
new
…
fields.add(“author”,
“Hergé”);
fields.add(“text”,
“Destination
Moon”);
fields.add(“date”,
new
Date());
fields.add(“tags”,
tags);
coll.insert(new
BasicDBObject(fields));
DBObject <-> (B/J)SON
{author:”kyle”,
text:“Destination
Moon”,
date:
}
BasicDBObjectBuilder
dbObj
=
new
BasicDBObjectBuilder()
.append(“author”,
“Hergé”)
.append(“text”,
“Destination
Moon”)
.append(“date”,
new
Date())
.get();
String
text
=
(String)dbObj.get(“text”);
JSON.parse(…)
DBObject
dbObj
=
JSON.parse(“
{‘author’:‘Hergé’,
‘text’:‘Destination
Moon’,
‘date’:‘Sat
Jul
24
2010
19:47:11
GMT-‐0700
(PDT)’,
}
”);
Lists
DBObject
dbObj
=
JSON.parse(“
{‘author’:‘Hergé’,
‘text’:‘Destination
Moon’,
‘date’:‘Sat
Jul
24
2010
19:47:11
GMT-‐0700
(PDT)’,
}
”);
List<String>
tags
=
new
…
tags.add(“comic”);
tags.add(“adventure”);
dbObj.put(“tags”,
tags);
{…,
tags:
[‘comic’,
‘adventure’]}
Maps of Maps
Can represent object graph/tree
Always keyed off String (field)
Morphia: MongoDB Mapper
Maps POJO
Type-safe
Access Patterns: DAO/Datastore/???
Data Types
JPA like
Many concepts came from Objectify (GAE)
Annotations
@Entity(“collectionName”)
@Id
@Transient (not transient)
@Indexed(…)
@Property(“fieldAlias”)
@AlsoLoad({aliases})
@Reference
@Serialized
[@Embedded]
Lifecycle Events
@PrePersist
@PreSave
@PostPersist
@PreLoad
@PostLoad
EntityListeners
EntityInterceptor
Basic POJO
@Entity
class Person {
@Id
String author;
@Indexed
Date date;
String text;
}
Datastore Basics
get(class, id)
find(class, […])
save(entity, […])
delete(query)
getCount(query)
update/First(query, upOps)
findAndModify/Delete(query, upOps)
Add, Get, Delete
Blog
entry
=
new
Blog(“Hergé”,
New
Date(),
“Destination
Moon”)
Datastore
ds
=
new
Morphia().createDatastore()
ds.save(entry);
Blog
foundEntry
=
ds.get(Blog.class,
“Hergé”)
ds.delete(entry);
Queries
Datastore
ds
=
…
Query
q
=
ds.createQuery(Blog.class);
q.field(“author”).equal(“Hergé”).limit(5);
for(Blog
e
:
q.fetch())
print(e);
Blog
entry
=
q.field(“author”).startsWith
(“H”).get();
Update
Datastore
ds
=
…
Query
q
=
ds.find(Blog.class,
“author”,
“Hergé”);
UpdateOperation
uo
=
ds.createUpdateOperations
(cls)
uo.inc(“views”,
1).set(“lastUpdated”,
new
Date
());
UpdateResults
res
=
ds.update(q,
uo);
if(res.getUpdatedCount()
>
0)
//do
something?
Update Operations
set(field,
val)
unset(field)
inc(field,
[val])
dec(field)
add(field,
val)
addAdd(field,
vals)
removeFirst/Last(field)
removeAll(field,
vals)
Relationships
[@Embedded]
Loaded/Saved
with
Entity
Update
@Reference
Stored
as
DBRef(s)
Loaded
with
Entity
Not
automatically
saved
Key<T>
(DBRef)
Stored
as
DBRef(s)
Just
a
link,
but
resolvable
by
Datastore/
Query
MongoDB features in Java
• Durability
• Replication
• Sharding
• Connection options
Durability
What failures do you need to recover from?
• Loss of a single database node?
• Loss of a group of nodes?
Durability - Master only
• Write acknowledged
when in memory on
master only
Durability - Master + Slaves
• Write acknowledged when
in memory on master +
slave
• Will survive failure of a
single node
Durability - Master + Slaves +
fsync
• Write acknowledged when in
memory on master + slaves
• Pick a “majority” of nodes
• fsync in batches (since it
blocking)
Setting default error checking
//
Do
not
check
or
report
errors
on
write
com.mongodb.WriteConcern.NONE;
//
Use
default
level
of
error
check.
Do
not
send
//
a
getLastError(),
but
raise
exction
on
error
com.mongodb.WriteConcern.NORMAL;
//
Send
getLastError()
after
each
write.
Raise
an
//
exception
on
error
com.mongodb.WriteConcern.STRICT;
//
Set
the
concern
db.setWriteConcern(concern);
Customized WriteConcern
//
Wait
for
three
servers
to
acknowledge
write
WriteConcern
concern
=
new
WriteConcern(3);
//
Wait
for
three
servers,
with
a
1000ms
timeout
WriteConcern
concern
=
new
WriteConcern(3,
1000);
//
Wait
for
3
server,
100ms
timeout
and
fsync
//
data
to
disk
WriteConcern
concern
=
new
WriteConcern(3,
1000,
true);
//
Set
the
concern
db.setWriteConcern(concern);
Using Replication from Java
slaveOk()
- driver to send read requests to Secondaries
- driver will always send writes to Primary
Can be set on
-‐
DB.slaveOk()
-‐
Collection.slaveOk()
-‐
find(q).addOption(Bytes.QUERYOPTION_SLAVEOK);
Using sharding Java
Before sharding
coll.save(
new
BasicDBObjectBuilder(“author”,
“Hergé”).
append(“text”,
“Destination
Moon”).
append(“date”,
new
Date());
Query
q
=
ds.find(Blog.class,
“author”,
“Hergé”);
After sharding
No
code
change
required!
Connection options
MongoOptions
mo
=
new
MongoOptions();
//
Restrict
number
of
connections
mo.connectionsPerHost
=
MAX_THREADS
+
5;
//
Auto
reconnection
on
connection
failure
mo.autoConnectRetry
=
true;
Part Five
Deploying MongoDB
Part Five
Deploying MongoDB
• Performance tuning
• Sizing
• O/S Tuning / File System layout
• Backup
Backup
• Typically backups are driven from a slave
• Eliminates impact to client / application traffic to master
Backup
•Two strategies
• mogodump / mongorestore
• fsync + lock
mongodump
• binary, compact object dump
• each consistent object is written
• not necessarily consistent from start to finish
fsync + lock
• fsync - flushes buffers to disk
• lock - blocks writes
db.runCommand({fsync:1,lock:1})
• Use file-system / LVM / storage snapshot
• unlock
db.$cmd.sys.unlock.findOne();
Slave delay
• Protection against app faults
• Protection against administration mistakes
O/S Config
• RAM - lots of it
• Filesystem
• EXT4 / XFS
• Better file allocation & performance
• I/O
• More disk the better
• Consider RAID10 or other RAID configs
Monitoring
• Munin, Cacti, Nagios
Primary function:
• Measure stats over time
• Tells you what is going on with
your system
• Alerts when threshold reached
Remember me?
Summary
MongoDB makes building Java Web application simple
You can focus on what the apps needs to do
MongoDB has built-in
• Horizontal scaling (reads and writes)
• Simplified schema evolution
• Simplified deployed and operation
• Best match for development tools and agile processes