Data Fabric
by Nasdaq Data Link
Enabling Rapid Data
Deployment
Enabling Rapid Data Deployment
Nasdaq Data Link powers data-driven decision-making for over
700K users the world over. Our robust, universal API is tailor-
made for financial institutions and a preferred method of data
discovery and ingestion for tens of thousands of organizations.
For the first time ever, we’re empowering our clients with
access to the same technology and team that powers
Nasdaq Data Link. Enabling them to ingest and deploy data
within their organizations with greater speed and efficiency,
allowing them to focus on the core value-drivers of their firm.
2
Continued on the next page
Data Fabric by Nasdaq Data Link
Mapping the data deployment odyssey
According to a survey initiated by Nasdaq and conducted
by Wakefield Research, 93% of portfolio managers are
not fully satisfied with some aspect of their organization’s
data management capabilities.
Given how challenging the path is from wanting data, to deploying data,
this is an unsurprising state of affairs. From ingestion to cleansing
to productizing and finally to deployment, the journey of data onboarding
is perilous.
Ingest Onboard
Connect to Parse & reformat Normalize to std. Document & Load into data
Source data Clean & QA
source & transfer data format/symbology Catalogue warehouse
Productize
Monitor & maintain Research & develop Set up compute
Build final ETL pipeline
delivery and access analytics environment
Portfolio Managers
Analysis Reports
Deploy
Trading Systems
Applications
3
Continued on the next page
Data Fabric by Nasdaq Data Link
With Data Fabric by Nasdaq Data Link, our goal is to make the middle
component invisible to our clients, eliminating the burden of the intervening
steps between data selection and deployment. Like so:
Select data
source Data Fabric
Portfolio Managers
Analysis Reports
Deploy
Trading Systems
Applications
This document will zoom in and break down the 3 super categories that
define data deployment and how the Data Fabric technology and team work
within your organization to make the entire process painless: Ingest,
Onboard and Productize.
Finally, we’ll review the critical, easy-to-use, proprietary tooling we’ve
developed to make the on-going administration of data access, audit,
governance, reporting and more a breeze to conduct called Share & Manage.
4
Continued on the next page
Data Fabric by Nasdaq Data Link
Ingest & Unify
Investors and analysts require data that comes from myriad sources
in highly variable formats. These can be streaming datasets, batched
at different intervals, flat-files downloaded from websites or on-premise
servers, public cloud solutions, Hadoop clusters and more. Each source
and format is unique and requires human analysis with experience diverse
data sources to properly handle.
Websites Cloud Storage Hadoop Clusters Local Server
Data Fabric’s data ingestion team will unify the collection of tables that make
up the dataset. They will then set up pipelines to blobs, SQL queries and
automated scripts to collect the data to a staging environment. Proprietary
Data Fabric monitoring technology ensures that data is being captured
accurately as it flows in from their respective sources.
5
Continued on the next page
Data Fabric by Nasdaq Data Link
The Data Fabric team understands that financial data science depends on
comparable and consistent data. In addition to checking data for missing
values, our team will standardize schema mapping, entity identifiers and
time scales. The mess of sources becomes a time series file ready for
further processing
Raw data is often messy,
Unify
inconsistent and filled
with gaps. Our team of data
scientists transform raw
data into an organized
time-series
Onboard
Nasdaq Data Link has deployed thousands of datasets since its’ inception.
That kind of publishing volume is only possible with the use of machine
intelligence that automates much of the process that would otherwise
require human intervention.
6
Continued on the next page
Data Fabric by Nasdaq Data Link
Our team of data scientists will deploy machine learning models to tag
relevant information in the dataset you wish to onboard. This can be content
to drive analysis, or metadata to help understand data lineage. Combined,
these create searchable and, most importantly, understandable data at the
end of the pipeline.
Data Fabric deploys
advanced data scient
Tag
techniques to identify
the components
Internal Metadata of source
of a data table at scale
8/17/22 Date
AAPL Company
$160
Price
Close price
Hold NLP text
7
Continued on the next page
Data Fabric by Nasdaq Data Link
Once tagging is complete, a catalogue or index of the data becomes
possible. Its purpose is to help current and new users to understand what
data exists and how they can use it.
A factored catalogue allows
users to understand what
Catalogue
data exists and how they
can use it
Field Description
Code to identify
ISIN security
Cash available to
Free cash flow company
Interest coverage Ratio of ease of
ratio payment debt
Alternative data for
Retail football tracking activity
8
Continued on the next page
Data Fabric by Nasdaq Data Link
Data classification is an important step at this stage in our process,
especially as it pertains to auditing and permissioning potentially sensitive
components of a dataset. Data can be classified to help control compliance,
governance and privacy. By switching to field level classification instead of
user level classification, more data can be shared with more people, while
ensuring sensitive information remains secure.
A factored catalogue
Access
allows users to
understand what data
Public field
exists and how they
can use it Public field All users
Open field
Sensitive field Per user access
Private field Per department
access
9
Continued on the next page
Data Fabric by Nasdaq Data Link
Productize
The traditional analysis pipeline has a number of components which requires
key decisions at each step. When multiplied by the tens of thousands of
packages available, even small teams can end up with a set of fractured
services that don’t support each other.
Set up compute environment
Fabric supports the most popular and proven tools for financial data analysis
like python, R and Scala to name a few. With the dozens of IDE’s and
compilers available for these languages, it’s inevitable that data scientists
may choose a different setup than their colleagues. If libraries or functions
exist in one environment but not the other, it could create complications such
as code not running in different environments because of missing functions,
or erroneous outputs.
With Fabric, the IDE serves as an input, so you have the flexibility to use the
IDE which has the best core functionality for you, and we’ll provide the
relevant financial analysis packages and libraries.
Research and develop analytics
With tens of thousands of libraries available for analysis, from numpy and
scikit to plotly and pandas, it can be difficult to choose and create consensus
around the most appropriate ones. Rather than dealing with separate
libraries from separate authors (which can become outdated and hard to
maintain) our platform has preconfigured tools for comprise back testing,
calculating financial ratios and charting stock prices and fundamentals.
10
Continued on the next page
Data Fabric by Nasdaq Data Link
Manage Git repositories and compute services
Managing code changes and setting up compute services can be timely and
costly. Your data scientists shouldn’t be worried about job or workflow
scheduling, configuring clusters, or how many nodes, processor cores and
RAM they need.
With Data Fabric, we’ve set up efficient compute services for dataframe
analysis, machine learning with tensorflow and other financial workloads.
Your workflows scale efficiently due to Nasdaq’s partnership with Databricks.
Monitor & maintain delivery and access
Just like data coming in from many sources, good analysis will flow out to
different use cases. Building a delivery monitoring system to ensure data is
getting where it needs to be can be complex. Fabric has built an efficient
monitoring system with alerts, SLI metrics and dashboards to monitor
trends, identify issues and feed analysis to the right products and users.
11
Continued on the next page
Data Fabric by Nasdaq Data Link
Share & Manage
Beyond data ingestion and deployment, Data Fabric comprises a multitude
of features that comprises the full data lifecycle; from lineage to usage, that
ultimately allows for the efficient maintenance of data deliver within an
organization.
Starting with data discovery within the organization after the dataset has
already been deployed. Ensuring that data is easily discovered in a virtual
catalogue (with an interface not unlike Nasdaq Data Link) makes for an ideal
user interface. As a new investment manager or researcher starting within
an organization, knowing where to go to see all of its data will save countless
hours simply finding out what’s available. The alternative of which is merely
another system requiring maintenance.
With all of the data now centralized on a single platform, administering
access to it is another key strength of Data Fabric, allowing the team
responsible for deciding who has access to what dataset to easily provide
and control the amount and depth of permission. Centralizing administration
also allows the organization to automatically track and record granular
usage of a dataset for audits, compliance and governance purposes.
Get value from your data
faster with Data Fabric
data.nasdaq.com/datafabric
REQUEST A CUSTOM DEMONSTRATION
12
Continued on the next page Nasdaq Data Link
[email protected]