Graph Database
- List of Top Vendors
Technology Assessment Services
June 2020
Understanding Graph Database
Graph Database
Based upon the concept of a mathematical graph, a graph database contains a collection of nodes and edges
A node represents an object, and an edge represents the connection or relationship between two objects
Each node in a graph database is identified by a unique identifier that expresses key value pairs
Additionally, each edge is defined by a unique identifier that details a starting or ending node, along with a set of
properties
Traditional Database store data to efficiently store facts,
but relationships must be rebuilt with JOINs and other
inexact techniques
Graph Database store both facts and the relationships
between the facts, making certain types of analysis more
Intuitive Traditional Graph
Database Database
Graph vs Relational Database?
Relational databases store highly-structured data in tables
with pre-determined columns and rows, graph databases
can map multiple types of relational and complex data
Thus, graph databases are not rigid in their organization
and structure, where as relational databases are rigid
Graph Database
Graph Database | Technology Assessment Services | June 2020 Source: CBR online, Neo4j, © Capgemini 2020. All rights reserved | 2
Graph Query Languages
Graph Database Query Language (GraphQL), is a concrete mechanism for creating, manipulating and querying graph data in a
graph database
Graph query languages are SQL equivalents for Graph DBMS
GraphQL is actually an API Query Language while Gremlin, SPARQL and now GQL are all Query Languages for Graph Databases
1 Gremlin 2 Cypher 3 SPARQL
Most common and widely-used Originally developed by Neo4j as Originally developed by the W3C
graph query language a graph query language that to query data stored in the
allows users to store and retrieve Resource Description Framework
It is the query language of data from the graph database (RDF) format for metadata
Apache TinkerPop graph
computing framework Opensource since 2015 SPARQL (SPARQL Protocol And
and openCypher project RDF Query Language) is a W3C
Gremlin is a functional, data- provides an open language standard designed to meet the
flow language that enables users specification, technical use cases identified by the RDF
to succinctly express complex compatibility kit, and reference Data Access Working Group
traversals on (or queries of) their implementation of the parser,
application's property graph planner, and runtime for Cypher Even though its a protocol, for
most use cases SPARQL's
Widely adopted and supported by OpenCypher has industry greatest value is a query
nearly all graph databases support, most prominently by language for RDF
supporting Property Graphs (PG) SAP graphs (another W3C standard)
Graph Database | Technology Assessment Services | June 2020 © Capgemini 2020. All rights reserved | 3
Use cases of Knowledge graphs
It powers Google’s search engine, as the original page rank algorithm is based on a form of
knowledge graph, as well as later additions to its search technology
Relies on this form of information organization, to keep track of networks of people
and the connections between them, as well as every other data point they use to
build a picture of their users, such as favorite artists and movies, events attended
and geographical locations
Uses knowledge graph technology to organize information on its vast catalog of content, drawing
connections between movies and TV shows and the actors, directors or producers who put them
together. This helps them to predict what customers might like to watch next, and foster the "binge-
watching" model of consumption it has built its business around
Uses knowledge graphs to build accessible models of all of the data it generates and
stores, and use it for risk management, process monitoring and building “digital
twins” – simulated versions of real-world systems which can be used for design,
prototyping and training
Graph Database | Technology Assessment Services | June 2020 © Capgemini 2020. All rights reserved | 4
List of Top 5 Graph
Database
Neo4j - Leading native graph database and graph platform
Competitive Advantages
Database combines everything needed for performance and trust in Implementation Language
applications that bring data relationships to the fore Java, Scala
Native graph storage, native graph processing, graph scalability, high Server operating systems
availability, graph clustering, graphs in the cloud, graphs on Spark,
Linux, OS X, Solaris,
built-in ETL, and integration support, plus Cypher, a powerful and
Windows
expressive language for queries using vastly less code than SQL
Applications / Use cases
Open source Graph APIs and other access
Real-Time Recommendations
Database Scalability methods
Master Data Management
Highly scalable both vertically Identity and Access Mgmt Bolt protocol
Neo4j is a native graph and horizontally, without Cypher query language
Network and IT Operations
database platform that is introducing data integrity or Java API
Fraud Detection
built to store, query, consistency issues using its Neo4j-OGM
Tax Evasion /AML
analyze and manage Causal Clustering architecture Neo4j-OGM
Graph-Based Search
highly connected data Also supports multi-clustering Spring Data Neo4j
Graph Analytics & Algorithms
more efficiently than other TinkerPop 3
Graph-powered AI
databases
Smart Homes, IoT
Initial Release – 2007
Current Release - 4.0.5, June Major Customers Supported programming
2020 300 commercial customers and over 750 startups languages
eBay, Walmart, Cisco, Citibank, ING, UBS, HP, CenturyLink, Telenor, .Net, Clojure, Elixir, Go, Groovy,
TomTom, Telia, Comcast, The National Geographic Society, Airbus, Haskell, Java, JavaScript, Perl,
Orange, AT&T, Verizon, DHS, US Army, Pitney Bowes, Vanguard, PHP, Python, Ruby, Scala
Microsoft, IBM, Thomson Reuters, Amadeus Travel, Caterpillar, Volvo
Graph Database | Technology Assessment Services | June 2020 © Capgemini 2020. All rights reserved | 6
OrientDB - First Multi-Model Distributed DBMS with a True
Graph Engine
Competitive Advantages
It is touted to be the fastest graph database and OrientDB’s query Implementation Language
language is built on SQL Java
Can be used as a pure Graph Database or as a Multi-Model, avoiding Server operating systems
using multiple DBMS products in the same application All OS with a Java JDK (>=
Supports the creation of schemas around graphs JDK 6)
Scalability
Open source , Multi-model Supports a Multi-Master + Applications / Use cases
Sharded architecture: all the APIs and other access
DBMS (Document, Graph, Fraud detection
servers are masters methods
Key/Value) Fighting Crime
Manages relationships without Tinkerpop technology stack
Investigation, Fraud
using JOINs, but rather direct with Blueprints
Multi-Model means 2nd Detection and prevention
pointers. This allows to have Gremlin, Pipes
generation NoSQL able to Data Governance, Master
constant performance on Java API
manage complex domain Data Management
traversing relationships, no RESTful HTTP/JSON API
with incredible Traffic Management
performance matter the database size
OrientDB is the first Multi-
Major Customers
Model Distributed DBMS Supported programming
Verizon, KPMG, AT&T, Expedia, Dell, Comcast, JPMorgan Chase,
with a True Graph Engine languages
Schneider Electric, Accenture, CenturyLink, Cisco, SAP, Informatica,
Juniper Networks, United Nations, AXA equitable, Warner Music, Sky, .Net, C, C#, C++, Clojure, Java,
Initial Release – 2010 JavaScript, JavaScript (Node.js),
Kaiser Permanente, Pitney bowes, Vadafone, Orange
Current Release - 3.1.0, June PHP, Python, Ruby, Scala
Several clients have passed from Neo4j to OrientDB
2020
An independent benchmark study by IBM and the Tokyo
Institute of Technology showed that OrientDB is 10x faster than
Neo4j on graph operations among all the workloads
Graph Database | Technology Assessment Services | June 2020 © Capgemini 2020. All rights reserved | 7
ArangoDB - Fast growing native multi-model NoSQL
database
Competitive Advantages
As a native multi-model database, can be used as a full blown
document store, graph graph database, search engine or any Implementation Language
combination of these technologies C++
Strong Data Consistency and Simplified Performance Scaling Server operating systems
Deployment is very easy with the ArangoDB Starter and as well on Linux, OS X, Windows
Kubernetes with the ArangoDB Operator
Open source , native Scalability Applications / Use cases APIs and other access
multi-model DBMS for Scales both vertically and Single View of everything methods
graph, document, horizontally Cybersecurity AQL
key/value and search If performance needs Simulations in manufacturing Foxx Framework
decrease, it can be easily Identity & Access Mgmt Graph API (Gremlin)
All in one engine and scale down the backend Fraud detection GraphQL query language
accessible with one query system to save on hardware Recommendation Engines HTTP API
language and operational Feature Engineering in ML & Java & SpringData
requirements. AI JSON style queries
Designed to store data Network Mgmt & Surveillance VelocyPack/VelocyStream
natively as key-value pairs,
graphs and JSON documents Major Customers
that can be accessed with Cisco, Barclays, Refinitive, Siemens Mentor, Kabbage, Liaison, Douglas, Supported programming
one declarative query MakeMyTrip, Kaseware, Demonware, Brainhub, Oxford University, IC languages
language - AQL Manage, Actify C#, C++, lojure, Elixir, Go, Java,
JavaScript (Node.js), PHP,
Initial Release – 2012 Gartner Peer Insight recognizes ArangoDB as one of the highest Python, R, Rust
Current Release - 3.6.0, rated operational databases
January 2020
Graph Database | Technology Assessment Services | June 2020 © Capgemini 2020. All rights reserved | 8
Microsoft Azure CosmosDB - Native support for NoSQL
choices
Competitive Advantages
Offers multiple well-defined consistency models Implementation Language
Guarantees single-digit-millisecond latencies at the 99th percentile, C++
and guarantees high availability with multi-homing capabilities and Server operating systems
low latencies anywhere in the world Hosted
Applications / Use cases
Scalability Management of customer- APIs and other access
Indexing, scaling, and geo- generated data, such as blog methods
replication are handled posts, ratings and comments DocumentDB API
Globally distributed, Graph API (Gremlin)
automatically in the Azure Store catalogs and manage
horizontally scalable, MongoDB API
cloud, without any knob- event data
multi-model database RESTful HTTP API
twiddling on user’s end Supports Microsoft Store and
service Table API
Xbox Live
Internet of Things
Azure Cosmos DB provides
native support for NoSQL
choices Supported programming
Major Customers languages
ABB, CocaCola, Citrix, Caltex, Symantec, Liberty Mutual, Servicelink, .Net, C#, Java, JavaScript,
Initial Release – 2014 Zeiss, Diply, Archive360, Allscripts, Johnson Controls, Quest, Swedavia
Current Release - NA JavaScript (Node.js)
Airports, New Zealand Trade & Enterprise, BMI, Siemens Healthineers, MongoDB client drivers
Exxonmobil, Aveva, Skype, Rolls-Royce, Kohler, Albertsons-Safeway, written for various
SitePro, Bentley, Kognitiv Spark, Cincinnati Children's Hospital Medical programming languages
Center, Finastr Python
Graph Database | Technology Assessment Services | June 2020 © Capgemini 2020. All rights reserved | 9
Amazon Neptune - Fully-managed graph database service
Competitive Advantages
Fast, reliable, fully-managed graph database service that makes it Implementation Language
easy to build and run applications that work with highly connected Java, Scala
datasets Server operating systems
The core of Amazon Neptune is a purpose-built, high-performance Hosted
graph database engine optimized for storing billions of relationships
and querying the graph with milliseconds latency
Applications / Use cases
Fraud detection
Fast, reliable graph Scalability Recommendation engines
database built for the Indexing, scaling, and geo- Social networking APIs and other access
cloud replication are handled Regulatory compliance
methods
automatically in the Azure Knowledge graphs
RDF 1.1 / SPARQL 1.1
Supports popular graph cloud, without any knob- Supply chain transparency
TinkerPop Gremlin 3.3
models Property Graph twiddling on user’s end Network/IT Operations -
and W3C's RDF, and their including identity and access
respective query management, detection of
languages Apache malicious file paths
TinkerPop Gremlin and
SPARQL, all
Major Customers Supported programming
Initial Release – 2017 Siemens, AstraZeneca, Samsung, Pearson, Intuit, Amazon Alexa, languages
Current Release - NA Thomson Reuters, Finra, Ingnition One, Blackfynn, Pay Sense, LiFeOMIC C#, Go, Java, JavaScript,
PHP, Python, Ruby, Scala
Graph Database | Technology Assessment Services | June 2020 © Capgemini 2020. All rights reserved | 10
Other Popular graph databases
Enterprise Knowledge Graph Distributed, hyper- Enterprise RDF and graph
platform and graph DBMS with relational database for database with efficient
high availability, high managing complex data that reasoning, cluster and external
performance reasoning, and serves as a knowledge base for index synchronization support
virtualization cognitive/AI systems
Scalable, secure, and standards- SPARQL is used as query language
based It stores data in a way that allows High-performance semantic
Virtual data Connectors to all major machines to understand the repository created by Ontotext
SQL servers, Cassandra, MongoDB meaning of information in the Implemented in Java and packaged
and more to easily access data complete context of their as a Storage and Inference Layer
silos relationships (SAIL) for the RDF4J framework
NLP pipeline, BITES, lets user Consequently, Grakn allows Loading, reasoning and query
incorporate unstructured data in computers to process complex evaluation proceed fast even
addition to SQL and NoSQL data information more intelligently with against huge ontologies and
into the knowledge graph less human intervention knowledge bases
BI/SQL Server which translates the Graql is a declarative, knowledge- MongoDB integration for large-
knowledge graph back into SQL; oriented graph query language that scale metadata management
supported platforms include uses machine reasoning for Most utilized semantic triplestore
Tableau, PowerBI, Cognos, and retrieving explicitly stored and for mission-critical enterprise
more implicitly derived knowledge from deployments
Implementation Language: Java Grakn Implementation Language: Java
Customers: Bosch, Dow Jones, Implementation Language: Java Initial Release – 2000
Elsevier, Ericsson, Morgan Stanley, Initial Release – 2016 Current Release – 8.8, January
NASA, NIH, Nokia, Salesforce, Current Release – 1.8.0, June 2020 2019
Siemens, Springer, Raytheon
Graph Database | Technology Assessment Services | June 2020 © Capgemini 2020. All rights reserved | 11
About Capgemini
A global leader in consulting, technology services and digital transformation,
Capgemini is at the forefront of innovation to address the entire breadth of clients’
opportunities in the evolving world of cloud, digital and platforms. Building on its
strong 50-year heritage and deep industry-specific expertise, Capgemini enables
organizations to realize their business ambitions through an array of services from
strategy to operations. Capgemini is driven by the conviction that the business
value of technology comes from and through people. It is a multicultural company
of almost 220,000 team members in more than 40 countries. The Group reported
2019 global revenues of EUR 14.1 billion.
Learn more about us at
www.capgemini.com
This presentation contains information that may be privileged or confidential
and is the property of the Capgemini Group.
Copyright © 2020 Capgemini. All rights reserved.