IBM’s Data Fabric
Perspective
1
What Defines a Data Fabric
A Data Fabric is an architecture, set of services and a platform that standardizes and integrates data
across the enterprise regardless of data location (On-Prem, Cloud, Multi Cloud, Hybrid Cloud). A Data
Fabric must have these primary capabilities:
• Manage: A Data Fabric provides a singular and integrated means of management across the data ecosystem. It
must support data at rest, data in motion, integration of data, as well as providing the ability to virtualize the
enterprise. Must bring AI/ML to your data.
• Govern: A Data Fabric must map all enterprise assets to a single canonical model representing a common
enterprise vernacular. In addition, it must allow for data quality, stewardship, metadata and MDM/RDM capabilities.
• Secure: Classifications of information should serve as the single means of defining security policy and entitlements.
All data must be secure from outside attacks as well as provide the ability to identify insider threat.
Ø Intelligently automate “an emerging data management design concept for attaining flexible, reusable and augmented
data integration pipelines, services and semantics, in support of various operational and
analytics use cases delivered across multiple deployment and orchestration platforms. Data
Ø Leave data where it lives fabrics support a combination of different data integration styles and utilize active metadata,
knowledge graphs, semantics and ML to augment data integration design and delivery.
Ø Bring AI to your data Top Trends in Data and Analytics
Data Fabric is the Foundation, February 2021
Operationalizing a Data Fabric
Source
Metadata
Automated Onboarding
User & New Source
Knowledge Catalog
Metadata
Framework Automation (Metadata-Driven/Schema on Read Data Pipelines)
Data Lineage
Meta Model Drives
API/Micro-Services
Consumption
Real-Time
Event Based
Active DQ/ Processing
Profiling
Business
Applications
Real-Time Data Marketplace
MDM/
Business AI, ML &
RDM Events Streaming Conformed
Self-Service Optimization
Subscription Data at Rest
Bolt-on Source BI Reporting,
Sources Active DQ/
Self-Publish Virtualization Data Sharing Dashboards
CDC Cartridge Profiling Batch
Compliance
Reporting
Enterprise Analytics (Deployable Throughout the Architecture)
Discovery,
Data Quality & Data Governance Exploration,
Self-Service
Security
IBM Data
Create Fabric Capabilities through the Data Fabric
Enterprise
Data Sources, Types Data and AI
and Domains Any Data, Any Cloud, Anywhere Outcomes
Customer
Scale the value of ALL your Business Apps Centricity
data and accelerate your
journey to AI with greater
trust and productivity Custom Apps Operational Agility
Unlock new insights to more people
Democratize data discovery, understanding and
relevancy
Point of Sale
Deliver trusted, business-ready
Total Quality
data Management
Dynamically govern data quality, privacy and usage
policies
User Behavior
Unify data across hybrid clouds
Access and manage diverse data minimizing data
movement
Continuous
Improvement
Unify and speed data-AI lifecycles IoT
Empower collaborative, persona-centric workflows
Unleash data science productivity
Automate complex tasks, improve quality, lower skill
barriers Support Critical
Services
Devices
IBM’s Data Fabric Data Governance Perspective
Governance needs to be built into the function of the Data Fabric and supported through the underlying
architecture. The Data Fabric must be Self-Governing and facilitate easy adoption of data quality and data
stewardship functions through automation.
Easy & Light Touch Interaction
Ease of adoption and access to governance
function drives improved quality and enhances
security. Enabling light-weight governance
channels via SMS / Chatbot creates an easier-to-
Automated Classification manage end-user experience.
Classification of data enables the application Ownership
of enterprise policies for data privacy and
data security throughout the Data Fabric.
The Enterprise Knowledge Catalog builds on
foundational definitions, classification and
lineage through the Data Fabric
Knowledge Accessibility
Regulatory Compliance
High Quality Data
The ability to quickly identify and classify
Use real-time and interactive monitoring data in support of regulations governing PII,
capabilities to rapidly identify developing PCI, CPRA, & GDPR is an essential feature of
data quality issue to improve overall quality a modern Data Fabric that dictates
with minimal impact to data operations.
Quality Security & accessibility of data.
Privacy
Group Name / DOC ID / Month XX, 2019 / © 2019 IBM Corporation
IBM’s Data Fabric Platform Capabilities (1) Data Catalog discovers, catalogs
and enriches existing data across
sources, creating a knowledge graphs
of linked information
Existing Data (2) New Data
New SaaS App (2) Quickly on-boarding data from new
SaaS apps stored in an inter-cloud
Amazon S3 Inter-Cloud
Object Data
data warehouse in open, agile
New SaaS App formats, understood and cataloged
with the Data Catalog
Db2 Warehouse
(3) Automated data privacy creates
Users who wish to and enforces privacy and usage
analyze real-time,
policies for any catalog data asset
Snowflake AutoSQL unified views of data
(4) Virtual Data Access enables real-
Data Virtualization
time virtual data access to distributed
data discovered in the Data Catalog
and with privacy controls via
Automated data privacy
AutoAI
AutoCatalog
AI Model Lifecycle (5) Users access real-time, trusted
data via Virtual Data Access in their
analytics tool of choice
(6) AutoAI consumes distributed,
Watson AutoPrivacy time-variant data retrieved by Virtual
Knowledge Catalog Data Access to fuel AI lifecycles in
Data Scientists who create,
real-time
train and manage AI model
accuracy
6
Data Fabric ML Driven Automation
Data Sources Users
Data Quality, Data Governance, Data Privacy and Protection
Systems of APIs and External Metadata Exchange CDO
Record
Automated Curation Services Automated Metadata Self-Service Interaction
Governed Data Access to Business Ready Data
IOT Management Governance
Search & Find Relevant Officers
Auto Discover Data Machine Learning & Data
Automation
Systems of
Tagging, Annotations,
Insights Auto Classify Data Data Quality
Comments
Analysts
Data Connectors
Cloud Auto Detect Sensitive
Self-Service Data Access
Data
Knowledge
Auto Analyze Data Self-Services Data
Hadoop Catalog Data
Quality Preparation
Stewards
Auto Assign Business
Projects & Collaboration
Terms
Social Media Data
Scientists
Policy Data Privacy & Business Data Quality Workflow
Data Lineage Management &
Unstructured Enforcement Protection Glossary Management Management
Business
Users
Other Integrated Data Management Capabilities
External Industry
Data Integration Entity Reference Data Data Advanced Data
Knowledge Data
& Replication Management Management Virtualization Preparation
Accelerators Engineers
Industry Data &
Applications
IBM Cloud | Amazon Web Services | Microsoft Azure | Google Cloud | Hyperconverged system 7
IBM’s Data Fabric Service Capabilities
Workload Analysis & Modernization Real-Time & Batch Monitoring is enabled
Automation that creates detailed across Governance, Data Quality across all
inventories of data ecosystems and their topics in the Data Fabric using
dependencies, automated translation of customizable, open-source Grafana
legacy data processing code, and dashboards and portlets.
automated testing; essential for creating
actionable migration roadmaps.
”No-code” Real-Time & Batch
Frameworks for Data Onboarding creates The Data Marketplace is the central
Data Pipelines for managing the ingestion, provisioning point for data consumption in the
organization and publication of data. Our Data Fabric. Data is persisted in original and
frameworks are created as an open, light- curated forms with purpose-fit storage allowing
weight & flexible code base that are simple for publication and subscription of data.
to deploy in any ecosystem. The data
onboarding process learns and adapts
using Machine Learning models as new or
changed source data arrives. Data Marketplace
Streaming Conformed
Self-Service
Subscription Data at Rest
AI driven Governance, Data Quality, & Profiling
automates the classification and organization of
data against Enterprise Canonical Models and Self-Publish Virtualization Data Sharing
provides real-time insight into data quality.
Automated data tagging provides classification for
the application of Security Policy across the Data
Fabric.
What are the business benefits of a Data Fabric?
For technical For business
teams and teams &
CTOs CDOs
à Decreased effort to maintain data quality à Gaining faster and more accurate insights
standards due to fewer data version due to easy access to high quality data
à Reduced infrastructure and storage cost à Ability to focus time on analyzing rather then
(consolidated data management tools and finding and preparing data
reduction in data copies)
à Frustration-free full self-service data
à Faster and simplified data delivery processes shopping experience
due to fewer targets and advance
optimization of data flows à Avoidance of biased analysis due to data
restrictions
à Reduction in efforts for data access
management as it gets automated by global à Increased compliance and security despite
data policy enforcement full analytics utilization
Backup
Group Name / DOC ID / Month XX, 2018 / © 2018 IBM Corporation 10
IBM Data Fabric ML Driven Automation
Data Sources, Types Data and AI
Data Quality, Data Governance, Data Privacy and Protection
and Domains Outcomes
APIs and External Metadata Exchange
Automated Curation Services Automated Metadata Self-Service Interaction
Governed Data Access to Business Ready Data
Management
Customer
Business Apps Search & Find Relevant
Auto Discover Data Machine Learning &
Centricity
Data
Automation
Tagging, Annotations,
Auto Classify Data
Comments
Custom Apps Operational Agility
Data Connectors
Auto Detect Sensitive
Self-Service Data Access
Data
Knowledge
Auto Analyze Data Self-Services Data
Point of Sale
Quality
Catalog Preparation
Total Quality
Auto Assign Business Management
Projects & Collaboration
Terms
User Behavior
Policy Data Privacy & Business Data Quality Workflow
Data Lineage Management & Continuous
Enforcement Protection Glossary Management Management
Improvement
IoT
Integrated Data Management Capabilities
Industry
Data Integration Entity Reference Data Data Advanced Data
Knowledge
& Replication Management Management Virtualization Preparation Support Critical
Accelerators Services
Devices
IBM Cloud | Amazon Web Services | Microsoft Azure | Google Cloud | Hyperconverged system 11