A
Big
Data
Reference
Architecture
Needs
to
Consider
the
Following
Domains
7
Core
innova2on
Presenta;on
&
Applica;on
Enablers
into
an
enterprise
environment
Enable
exis;ng
and
new
applica;ons
3
Data
Integra;on
&
Governance
2
Data
Access
Unprecedented
insights:
Allow
simultaneous
access
by
and
;mely
insights
for
all
approved
users
across
en;re
data
lake,
using
dierent
processing
engine
and
schema
on
read
Integrate
with
exis;ng
systems.
Move
data
into,
within
and
out
of
1
Data
Management
the
environment,
while
minimizing
Petabyte
scale:
A
cquire
all
data
in
duplica;on
and
its
original
format
and
store
it
in
data
movement
one
place,
cost
eec;vely
and
for
very
long
;me
periods
6
Environment
&
Deployment
Model
Run
within
your
environment
and
in
a
public
cloud
Copyright
[email protected]
Security
Provide
layered
approach
to
security
that
dieren;ates
internal
and
external
users
Opera;ons
Deploy
and
manage
a
mul;
-
tenant,
environment
easily,
using
exis;ng
tools
where
possible
Core
Big
Data
Capabili;es
Required
Core
innova2ons
Data
Integra;on
&
Governance
Presenta;on
Reports
&
Dashboards
Clients
Extract,
Transform,
Load
Iden;ty
&
Access
Management
Real-Time
Monitoring
(Exis;ng
or
New)
Applica;on
OLAP
Web
&
Social
Media
Video
&
Audio
Geo-loca;on
Machine
Learning
&
Predic;on
SQL
Streaming
&
Complex
Event
Processing
Batch
Processing
Data
Connectors
Data
Isola;on
&
Mul;-tenancy
Search
&
Discovery
Graph
Processing
Data
Masking
Data
Management
Rela;onal
Database
(MPP)
Data
Warehouse
Distributed
Storage
Opera;ons
Data
Access
Data
Encryp;on
Security
&
Privacy
Text
&
Seman;cs
Real
Time
&
Batch
Inges;on
Life
Cycle
Management
Advanced
Visualiza;on
NoSQL
Database*
In-memory
Compu;ng
Custodian
Gateways
Physical
Infrastructure
Store
rst,
ask
ques;ons
later
(HDFS)
Parallel
processing
(MapReduce)
Commodity
HW,
cheap
storage
*
Includes
key
value,
document,
graph
and
object
data
bases.
Any
data
type,
incl.
unstructured
Real-;me
reasoning
on
new
data
Copyright
[email protected]
Google
for
Big
Data
Friends
&
family
social
NW
analysis
Predic;ons
enable
Prescrip;ons
1
Hadoop
and
Spark
Deliver
Many
of
the
Core
Innova;ve
Capabili;es
Required
Spark
Provides
A
Modern
Development
Environment
On
Top
Of
Hadoop
In-memory
high-speed
analy2cs
engine
Advanced
machine
learning
libraries
Unied
programming
model
across
all
processing
engines
Hadoop
Provides
The
Enterprise-Wide
Data
Lake
Allows
to
acquire
all
data
in
its
original
format
and
store
it
in
one
place,
cost
eec2vely
and
for
very
long
2me
periods
Allows
dierent
processing
engines
and
schema
on
read
Mature
mul2-tenancy,
opera2ons,
security
and
integra2on
Note:
Both
are
open
source
technologies
supported
and
embedded
by
a
wide
range
of
so9ware
and
services
vendors
Copyright
[email protected]