Internal Data Marketplace Guide
Internal Data Marketplace Guide
Guide
2
CONTENTS
What are data marketplaces for? 4
How do you know you need an internal data marketplace? 4
Consequences of not having an internal data marketplace 4
Practical guide to
launching an internal
data marketplace
Data marketplaces are all about solving the ‘last
mile’ problem for data. This means closing the gap
between data producers and consumers.
This is true whether you’re trying to access data
directly in a source system, through an interface,
or in a target location. This guide sets out practical
considerations for launching your own internal
data marketplace — that is, a marketplace that
empowers data access across your entire
organization.
Data marketplaces solve data access at scale. Data silos are a ubiquitous phenomenon in the
Solving the data access problem is critical for modern enterprise. While there are practical
Chief Data Officers and their data teams to: reasons for them to exist, they can result in
inhibited value creation, duplication of effort,
• Accelerate data-driven outcomes wasted resources, and frustration. The primary
• Provide architectural flexibility and avoid vendor reason for a data marketplace is to enable data
lock-in access, which means overcoming data silos.
• Foster an improved data culture
Multitude and diversity of data consumers
• Deliver and demonstrate return on investment
(ROI) Users will have different technical skills, use cases,
• Improve the experience for business and preferred tools. Additionally, a significant
stakeholders amount of collaboration will be required between
users to deliver successful outcomes from data.
Internal data marketplaces are purpose-built
How do you know you need an to serve the full range of data consumers and
the best ones will include collaborative and
internal data marketplace? asynchronous workflows.
Ideally, you’ve already identified the need for an
internal data marketplace, but if not, there are Marketplace or catalog?
certain factors that indicate a pressing need.
A data catalog is a technology that creates an
inventory of (typically) internal data. It’s a way
Business users frustrated with time-to-access to start understanding what data is available,
The unfortunate reality is that business users share but catalogs tend not to provide a view of the
a seemingly universal complaint: it simply takes too permitted use cases of the data, multiple ways of
long to access the data they need. Even when data accessing it, or the ability to manage data access
requests are highly specific, it’s not uncommon across technical and organizational boundaries.
for them to take weeks or even months to fulfill. Most firms already have at least one data catalog
Friction and delays lead to frustration, and in turn, in place, which is a helpful foundation for a data
a sense of futility. An internal data marketplace can marketplace.
short-circuit this issue by providing rapid access to
data and tooling. Consequences of not having an internal
data marketplace
Low trust between data users and owners
Now that we can recognize some of the indicators
It’s all too common for data users to mistrust or of needing a data marketplace, let’s briefly look at
have low confidence in the data. Similarly, data what happens when you don’t have one.
owners don’t trust users with the data, and to
mitigate risk, they put barriers in place. A data
marketplace with proper governance capabilities
1 Technology, data, and AI projects
take months and years to deliver.
enables data owners to grant access to users
while tightly managing risk. Data users also gain
clarity about the data, what it can be used for, and
2 Data initiatives show relatively poor
ROI — if they are even measured.
3
the source. This gives them greater confidence Inability to scale data access, whether
that they’re using the right data for their needs. through self-service or automation.
Connectors Interfaces
Manage the creation and maintenance of Host a range of interfaces to balance user
connectors to read, write, and copy data to and experience and risk. This may include cleanrooms,
from a range of sources such as cloud databases, sandboxes, workbenches, query engines, BI tools,
data lakes/warehouses, on-premises, and etc.
desktops.
Data transformation
Data assets
Manage the creation and maintenance of code
Full lifecycle management of data assets. This used to create custom data assets and the ability
includes ownership, entitlements, and lineage. to automate code execution on a scheduled and
Data assets should include tables, notebooks, event-driven basis.
images, visualizations, text, etc.
Export
Data products
Manage the creation and maintenance of data
Full lifecycle management of data products, pipelines, file transfers, and downloads on a one-
including asynchronous processes to create, off, scheduled, or event-driven basis.
publish, manage, and delete.
Identity and access management (IAM)
Subscriptions
Control all user and system access to the platform,
Record and enforce the entitlements set out in the services, and tools.
data product subscriptions covering duration of
access, type of access, number of users, etc. Miscellaneous
A range of services covering functional and non-
Storefront
functional requirements such as entity reference,
A capable and intuitive storefront for data products deletion, monitoring, lineage, authentication,
allowing a range of personas to easily discover
and understand what’s available to them.
7
security, etc.
User
personas Platform operators
Platform operators are responsible for running the
data marketplace. They will make decisions around
Understanding and serving users several key areas:
is key to a successful internal • Operating model: Is your marketplace strictly
data marketplace. There are three for internal users? Which teams will have access
broad user personas: platform to the marketplace? How are those teams
operators, data producers, and defined? Will data be centralized or left at
data consumers. These are often source?
groups of people, not single • Governance: How do you plan to set and
individuals, working together to enforce rules around data governance,
movement, access, and usage?
achieve specific goals. Individual
users may take on multiple • Access and control: Who will have access
to the marketplace? How do you plan on
personas at different times. For
delineating user roles and entitlements? Does
example, it’s very common for the the platform operator or the data producer
platform operator to also act as dictate how data is accessed?
both a data producer and a data • Reporting: How will you monitor the
consumer. Meeting user needs performance and usage of the platform? What
and building momentum is critical metrics will you collect? Who will you report
at all stages, especially at the start. these metrics to? How will your reporting drive
your business objectives?
• Cloud provider: What cloud provider(s) will you
use? Where will you deploy your platform? How
will you manage and apportion costs?
8
Data consumers
Data consumers are the end users of your data
assets and products. Consider how the following
will affect and empower your organization’s data
consumers:
• Roles: Who are the data consumers? What roles
need to be created? Who will be invited to use
the data marketplace? How will you delete and
remove users? How will you support users?
• User experience: Based on your user roles,
how will this affect what users can do? What
limits will you put on user behavior? What
are the rules around data access, usage, and
Data producers distribution? Which tools will be available to
users in the cloud workspaces? What usage
Data producers are those who configure
patterns do you want to enable?
and manage data assets and products. When
establishing your internal data marketplace, • Collaboration: Are users able and encouraged
data producers should think about: to collaborate within the marketplace? Which
tools will you need to provide to enable
• Data assets: What type of data assets are
effective collaboration?
available? Which data assets need to be
prioritized in the early stages of your data
marketplace? Will they be left at source or
copied to the platform? Who will they be
shared with?
• Data product management: How will the
lifecycle of data products be managed? Is
there a coherent data product management
philosophy or culture in your organization? Are
you comfortable with the balance between risk
and reward?
• Technology: Are source systems the optimal
storage layer for the target use cases? What
tools do the various users need to work with
data? What data pipelines will need to be
created and maintained?
• User experience: What user experiences do
you want to enable? Who are those
experiences for?
9
Data manufacturing
Data manufacturing tools, such as lakes and
warehouses, focus on centralizing, cleansing,
standardizing, and maintaining data. The ultimate
goal is to reduce or remove complexity and
minimize the cost of maintaining usable data. The
data manufacturing process will likely generate a
variety of data assets, including tables, notebooks,
text files, images, and visualizations.
Data catalogs
Data catalogs provide an inventory of the data in
your organization. The metadata from your catalog
should be able to be imported into your data Tools and technology on the data value chain: Your
marketplace. Additionally, any assets connected to, data marketplace should be integrate with the various
or created within, your data marketplace should be source, manufacturing, and consumption technologies
able to be referenced within the catalog(s). that your users are familiar with.
Cost to build
Cost to maintain
Time to deploy
Customizability
Methodology
Buy, build, or outsource? Different options will have different impacts on the various factors that need to be
taken into consideration.
Get in touch