Graph Databases (GDB)
Adrian Silvescu
Doina Caragea
Anna Atramentov
Problems And Motivations
• The necessity to represent, store and manipulate
complex data make RDBMS somewhat obsolete
• [P1] Problem 1: Violations of the 1NF
– Multi-valued attributes
– Complex attributes
– Complex combination of the previous two
• [P2] Problem 2 : Accommodate Changes
– Appears when acquiring data from autonomous
dynamic sources or Web (eg: genexp & restaurants ).
– RDBMS may require schema renormalization
Problems and Motivations (contd)
• [P3] Problem 3: Unified representation for:
– Data
– Knowledge (Schemas are a subset of this)
– Queries (More generally: Goals) [results+def]
– Models (Concepts are a particular example)
• In order to facilitate the application of
learning and reasoning methods on these
structures
The Big Picture USER
Discovery & Mode lli ng Que ry = Goal Specification
Information
Exploita tion Informa tion
AVAILABLE
INFORMA TION
IN THE WORLD
INFORMA TION INTEGRATION
Informa tion
Integra tion
... ... ...
KNOWLEDGE LINK DATA
SOURCES SOURCES SOURCES
Existing Approaches
• RDBMS – may need schema renormalisation
• Approaches that try to fix the above mentioned
problems:
– OO Databases [P1], [P2] - graphs [but procedural]
– XML Databases [P1] (somewhat [P3]) – trees
– OORDBMS [P1] – graphs with foreign keys
• Others
– Datalog – More Efficient Crippled Prolog
– Network Models - graphs
– Hierarchical Models – trees
• Therefore the motivation for Graph Databases
Outline
• Graph Databases
– Examples
– DDL
– DML – Queries
– DML – UPDATES
– Informal Semantics
– DB => GDB
• GDB vs. OO, XML, OR, …
• Conclusions and Further Work
Graph Databases
• We propose a new kind of Database: Graph
Databases (GDB) as a solution to Problems [P1],
[P2] and [P3].
• In order to define the GDB we will specify:
– The Data Definition Language (DDL)
– The Query Language (more generally DML)
– Informal Semantics of the above languages
• We will also show how to convert existing DBs (RDBMS)
into the GDB DDL to facilitate the transition to GDBs
Goals and Design Choice
• Goals
– Declarativity
– Change
• Design Choice : Have unique instance
identifiers vs. having foreign keys
– Close in Spirit to OO
– Will allow us to cope easier with Change
– Declarativity is an issue in OO, but not for GDB as we
will show
Database Representation
• Sailors(sid:integer, sname:char(10), rating: integer, age:real)
• Boats(bid:integer, bname:char(10), color:char(10))
• Reserve(sid:integer, bid:integer, day:date)
Sailors Reserves Boats
sid sname rating age sid bid day bid bname color
22 dustin 7 45.0 101 Interlake red
22 101 10/10/96
31 lubber 8 55.5 102 Clipper green
58 103 11/12/96
58 rusty 10 35.0 103 Marine red
22
Graph Representation sid
name dustin
IOF ID1
rating 7
IO Sailor
age 45.0
TB F s
L
sid 31
IOF
IOF
name lubbe
IOF ID2 r
Boat
s rating 8
Reserves
IOF IOF age 55.5
IOF ID8
ID6 ID3
…
ID4 ID5 : ID7 :
: :
Foreign Keys 22
sid
name dustin
ID1
rating 7
age 45.0
Sailor
sid 22
day 10/10/96
ID4
bid 101
Boat
bid 101
ID6
bname Interlake
color red
Data Representation in the GDB DDL
Name1 Val
1
Name2 Val
ID 2
……
Name Val
N N
• ID:(Name1=Val1,…,NameN=ValN)
Examples:
ID1:(sid=22, name=“Dustin”, rating=7, age=45.0)
ID4:(sailor=ID1, day=“10/10/96”, boat=ID6)
ID6:(bid=101, bname=“Interlake”, color=“red”)
Defining New Concepts in GDB DDL– Grandson
Person Person Person
IOF IOF IOF
_ID1 GrSon _ID2 :- _ID1 Son _ID3 Son _ID2
GrSon
_ID1:(GrSon=_ID2) :- _ID1:
(IOF=“Person”,Son=_ID3),
_ID3:(IOF=“Person”, Son=_ID2),
_ID2:(IOF=“Person”).
[DML-QL] Writing simple queries:
•The names of all sailors who have reserved a red boat
_X Sailors _X Boats
Name IOF Name IOF
_ID :- _ID Boat _ID1 Color Red
_ID:(Name = _X) :-
_ID:(IOF = Sailors, Boat = _ID1, Name = _X),
_ID1:(IOF = Boats, Color = Red).
Informal Semantics
• Three kinds of Definitions
– Facts:
• G1. [Extensional definition]
– Definitions:
• G1 :- G2. [Intensional definition]
• G1 :- PROC f(x1,…,xn). [Procedural definition]
• Queries = Graphs to be Matched = QG
– The same as a definition: Query :- QG.
Informal Semantics - Picture Query
Query match
Facts
Extended Graph
RDBMS => GDB
• Sailors(sid:integer, sname:char(10), rating: integer, age:real)
• Boats(bid:integer, bname:char(10), color:char(10))
• Reserve(sid:integer, bid:integer, day:date)
Sailors Reserves Boats
sid sname rating age sid bid day bid bname color
22 dustin 7 45.0 101 Interlake red
22 101 10/10/96
31 lubber 8 55.5 102 Clipper green
58 103 11/12/96
58 rusty 10 35.0 103 Marine red
DML - Updates
• Inline Query
– _X : [ _ID: (IOF = Sailor, sname = lubber, rating = _X)]
• Updates: MODIFY (QryGraph, UpdList)
– Add to all GrandSons the money of the Grandparent as a
potential inheritance.
– MODIFY ( _ID : (GrSon = _ID1),
(=> NEWID:(IOF = POT_INHER, BENFICIARY =
_ID1, AMOUNT = _AMNT: [ _ID:(Money =
_AMNT )]) ))
– .
Change
Book
IOF
Title Databases
_ID
Author Ramakrishan
Author Gehrke
Gene_ex
Change
IOF
_ID value 0.7
IOF _ID1 value x1
x / Exp IOF _ID2 value x2
:
IOF _IDn value xN
Aggregate
Operation
High Order Queries
Find all the fields from Tables that contain the name John.
_ID:(Name=_X) :-
_ID1:(IOF=Tables),
_ID2:(IOF=_ID1, _X=“John”).
GDB vs. OO, XML, OR, CG, …
• GDB are close in spirit to OO but not the same
(GDB : no encapsulation + more IDs).
• Close To Datalog but with IDs(links) vs foreign
keys
• The same for ORDBMs and somewhat XML
• Close to Conceptual Graphs But CG do not have
IDs
• We can also use foreign keys:
_ID:[ _ID(IOF = Sailors, sname = lubber)].
OO vs. GDB
OO GDB
ID1 Person
Class: Person IOF
age: 42 age 42 Car
name: john
car ID1
name John IOF
ID2
Class: Car car ID2
color: red
color
red
Conclusions and Further work
• Advantages & Disadvantages of GDB
• Conclusions
– Proposed a new database model DBG that copes with
problems existing in previous approaches
– Showed how to link existing DB with GDB
– Showed advantages of GDB over existing database
approaches
• Further Work
– Use GDB for learning