DATA MARKETPLACE GUIDE
Practical guide to launching
an internal data marketplace
Guide
2
CONTENTS
What are data marketplaces for? 4
How do you know you need an internal data marketplace? 4
Consequences of not having an internal data marketplace 4
What should data marketplaces enable? 5
What capabilities does your internal data marketplace need? 6
User personas 7
Platform owners 7
Data producers 8
Data consumers 8
Integration with existing tech stack 9
Getting your marketplace 10
Options explained 10
Purpose-built, market-proven solution 10
Self-build 10
Outsourced build 10
Build a strong foundation 11
Launch, grow, and scale your data marketplace 12
Take the next step 12
3
Practical guide to
launching an internal
data marketplace
Data marketplaces are all about solving the ‘last
mile’ problem for data. This means closing the gap
between data producers and consumers.
This is true whether you’re trying to access data
directly in a source system, through an interface,
or in a target location. This guide sets out practical
considerations for launching your own internal
data marketplace — that is, a marketplace that
empowers data access across your entire
organization.
This guide focuses on internal data
marketplaces, so doesn’t cover the
distribution of commercial data products
and services via a private or public
marketplace. If you’re interested in that,
check out our case studies of how Moody’s
Analytics and CoreLogic are using Harbr.
4
What are data marketplaces for? Existence of data silos
Data marketplaces solve data access at scale. Data silos are a ubiquitous phenomenon in the
Solving the data access problem is critical for modern enterprise. While there are practical
Chief Data Officers and their data teams to: reasons for them to exist, they can result in
inhibited value creation, duplication of effort,
• Accelerate data-driven outcomes wasted resources, and frustration. The primary
• Provide architectural flexibility and avoid vendor reason for a data marketplace is to enable data
lock-in access, which means overcoming data silos.
• Foster an improved data culture
Multitude and diversity of data consumers
• Deliver and demonstrate return on investment
(ROI) Users will have different technical skills, use cases,
• Improve the experience for business and preferred tools. Additionally, a significant
stakeholders amount of collaboration will be required between
users to deliver successful outcomes from data.
Internal data marketplaces are purpose-built
How do you know you need an to serve the full range of data consumers and
the best ones will include collaborative and
internal data marketplace? asynchronous workflows.
Ideally, you’ve already identified the need for an
internal data marketplace, but if not, there are Marketplace or catalog?
certain factors that indicate a pressing need.
A data catalog is a technology that creates an
inventory of (typically) internal data. It’s a way
Business users frustrated with time-to-access to start understanding what data is available,
The unfortunate reality is that business users share but catalogs tend not to provide a view of the
a seemingly universal complaint: it simply takes too permitted use cases of the data, multiple ways of
long to access the data they need. Even when data accessing it, or the ability to manage data access
requests are highly specific, it’s not uncommon across technical and organizational boundaries.
for them to take weeks or even months to fulfill. Most firms already have at least one data catalog
Friction and delays lead to frustration, and in turn, in place, which is a helpful foundation for a data
a sense of futility. An internal data marketplace can marketplace.
short-circuit this issue by providing rapid access to
data and tooling. Consequences of not having an internal
data marketplace
Low trust between data users and owners
Now that we can recognize some of the indicators
It’s all too common for data users to mistrust or of needing a data marketplace, let’s briefly look at
have low confidence in the data. Similarly, data what happens when you don’t have one.
owners don’t trust users with the data, and to
mitigate risk, they put barriers in place. A data
marketplace with proper governance capabilities
1 Technology, data, and AI projects
take months and years to deliver.
enables data owners to grant access to users
while tightly managing risk. Data users also gain
clarity about the data, what it can be used for, and
2 Data initiatives show relatively poor
ROI — if they are even measured.
3
the source. This gives them greater confidence Inability to scale data access, whether
that they’re using the right data for their needs. through self-service or automation.
4 Low levels of data reusability; you
want to build once and use often.
The reality is that until you enable flexible,
governed data access, you will struggle to deliver
value at any scale.
5
What should data
marketplaces enable?
When done well, internal data marketplaces
are transformative for organizations. But what’s
required for success? Your data marketplace
should enable:
Flexible data access
• At source: Data owners may want their data
to be accessed directly at source. The data
may be too large to move, they might not want
copies to be made, they may need to minimize
latency, or they need to optimize query Any type of digital asset
performance.
Data sometimes means rows and columns,
• Via an interface: Data owners may want an
but often it means files, images, reports, and
interface to control the amount of data that’s
visualizations. To be successful, a data marketplace
accessible, provide specific functionality to user
will need to support every digital asset that data
personas, or create a streamlined experience.
owners provide, as well as every format.
For maximum flexibility, the marketplace
should enable interfaces to query data
wherever it’s stored. A range of operating models
• In a target location: Data owners may be Your data marketplace will need to adapt to your
comfortable with copies of their data being business environment, with scalable controls
made. Users may also need to use the data in around how data, organizations, and users are
a range of systems including on-premises and managed. It must enable one or more operating
proprietary tooling. models, which may include:
• Internal sharing and distribution: Distribute
Different user personas your data products to internal business users
at scale. This model can support a data mesh
To be successful as a single interface for data architecture.
access, your data marketplace needs to work
for a wide range of user personas — from data • Internal collaboration: Share and collaborate
engineers and data scientists through to data on data across teams, divisions, and business
analysts, business users, and executives. Serving units.
these different users will have implications • Data acquisition/integrating external data:
for everything from the look and feel of each Evaluate and adapt data from suppliers and
user journey to the level of collaboration and redistribute internally.
asynchronous working that is required.
If sharing data outside of your organization,
additional operating models include:
• Data commerce: Monetize data products,
models, and data-related services.
• External data sharing: Securely share data
and models with external parties and service
providers.
• Partner collaboration: Collaborate with
partners without losing custody of data or
models.
6
What capabilities does your
internal data marketplace need?
To effectively serve your business and the range of users and use cases,
your internal data marketplace will need certain features and capabilities.
Connectors Interfaces
Manage the creation and maintenance of Host a range of interfaces to balance user
connectors to read, write, and copy data to and experience and risk. This may include cleanrooms,
from a range of sources such as cloud databases, sandboxes, workbenches, query engines, BI tools,
data lakes/warehouses, on-premises, and etc.
desktops.
Data transformation
Data assets
Manage the creation and maintenance of code
Full lifecycle management of data assets. This used to create custom data assets and the ability
includes ownership, entitlements, and lineage. to automate code execution on a scheduled and
Data assets should include tables, notebooks, event-driven basis.
images, visualizations, text, etc.
Export
Data products
Manage the creation and maintenance of data
Full lifecycle management of data products, pipelines, file transfers, and downloads on a one-
including asynchronous processes to create, off, scheduled, or event-driven basis.
publish, manage, and delete.
Identity and access management (IAM)
Subscriptions
Control all user and system access to the platform,
Record and enforce the entitlements set out in the services, and tools.
data product subscriptions covering duration of
access, type of access, number of users, etc. Miscellaneous
A range of services covering functional and non-
Storefront
functional requirements such as entity reference,
A capable and intuitive storefront for data products deletion, monitoring, lineage, authentication,
allowing a range of personas to easily discover
and understand what’s available to them.
7
security, etc.
User
personas Platform operators
Platform operators are responsible for running the
data marketplace. They will make decisions around
Understanding and serving users several key areas:
is key to a successful internal • Operating model: Is your marketplace strictly
data marketplace. There are three for internal users? Which teams will have access
broad user personas: platform to the marketplace? How are those teams
operators, data producers, and defined? Will data be centralized or left at
data consumers. These are often source?
groups of people, not single • Governance: How do you plan to set and
individuals, working together to enforce rules around data governance,
movement, access, and usage?
achieve specific goals. Individual
users may take on multiple • Access and control: Who will have access
to the marketplace? How do you plan on
personas at different times. For
delineating user roles and entitlements? Does
example, it’s very common for the the platform operator or the data producer
platform operator to also act as dictate how data is accessed?
both a data producer and a data • Reporting: How will you monitor the
consumer. Meeting user needs performance and usage of the platform? What
and building momentum is critical metrics will you collect? Who will you report
at all stages, especially at the start. these metrics to? How will your reporting drive
your business objectives?
• Cloud provider: What cloud provider(s) will you
use? Where will you deploy your platform? How
will you manage and apportion costs?
8
Data consumers
Data consumers are the end users of your data
assets and products. Consider how the following
will affect and empower your organization’s data
consumers:
• Roles: Who are the data consumers? What roles
need to be created? Who will be invited to use
the data marketplace? How will you delete and
remove users? How will you support users?
• User experience: Based on your user roles,
how will this affect what users can do? What
limits will you put on user behavior? What
are the rules around data access, usage, and
Data producers distribution? Which tools will be available to
users in the cloud workspaces? What usage
Data producers are those who configure
patterns do you want to enable?
and manage data assets and products. When
establishing your internal data marketplace, • Collaboration: Are users able and encouraged
data producers should think about: to collaborate within the marketplace? Which
tools will you need to provide to enable
• Data assets: What type of data assets are
effective collaboration?
available? Which data assets need to be
prioritized in the early stages of your data
marketplace? Will they be left at source or
copied to the platform? Who will they be
shared with?
• Data product management: How will the
lifecycle of data products be managed? Is
there a coherent data product management
philosophy or culture in your organization? Are
you comfortable with the balance between risk
and reward?
• Technology: Are source systems the optimal
storage layer for the target use cases? What
tools do the various users need to work with
data? What data pipelines will need to be
created and maintained?
• User experience: What user experiences do
you want to enable? Who are those
experiences for?
9
Integration with existing tech stack
An internal data marketplace will, by its nature, need to interact with many
parts of your existing technology stack, which may be quite diverse.
Data manufacturing
Data manufacturing tools, such as lakes and
warehouses, focus on centralizing, cleansing,
standardizing, and maintaining data. The ultimate
goal is to reduce or remove complexity and
minimize the cost of maintaining usable data. The
data manufacturing process will likely generate a
variety of data assets, including tables, notebooks,
text files, images, and visualizations.
Data catalogs
Data catalogs provide an inventory of the data in
your organization. The metadata from your catalog
should be able to be imported into your data Tools and technology on the data value chain: Your
marketplace. Additionally, any assets connected to, data marketplace should be integrate with the various
or created within, your data marketplace should be source, manufacturing, and consumption technologies
able to be referenced within the catalog(s). that your users are familiar with.
Identity and access management (IAM)
Tools
A data marketplace should integrate with your
identity and access management systems. In the Because your data marketplace will need to
event that these systems are not appropriate for cater to a range of users, they will expect the
managing access to data, the data marketplace marketplace to work well with their preferred
should be able to manage this independently. If tools. For example, in analytics use cases, this may
your marketplace is also used for external parties, include Excel, Tableau, Zeppelin, and many others.
it’s important that it can integrate with multiple
identity and access management systems.
10
Getting your marketplace
Acquiring the set of capabilities you need is challenging. Here is a quick guide
of the various options you have and the factors you should consider.
Factors to consider Options explained
• Cost to build: Software projects are
notoriously difficult to accurately estimate Purpose-built, market-proven solution
in terms of time and money. The difference The most straightforward option is to buy a
between different options in terms of cost purpose-built data marketplace platform and
can be millions of dollars. deploy it in your own environment. Look for a
• Cost to maintain: What costs will be vendor that has the core capabilities listed earlier
associated with supporting, maintaining, and and a proven track record of deploying within large
upgrading your data marketplace over time? organizations. Customizability, methodology, and
• Time to deploy: To prove value, will you go speed to deployment are areas where vendors
down the proof-of-concept (POC) or proof-of- should differentiate themselves.
value (POV) route? You’ll need ongoing support
and patience from leadership and stakeholders Self-build
throughout this process, so there is a premium Within this track, there is a spectrum from
on demonstrating value quickly. assembling components — with custom integration
• Customizability: To what extent is — to fully self-built. A self-built solution will be
customizability important? This is particularly unique and the design and implementation will
relevant if you’re considering buying or be under your full control. To do this, you’ll rely on
outsourcing the build. But it also applies to internal expertise, ideally from people who have
those building their own platforms, as design built, deployed, and managed similar systems
and engineering talent will play a major part. before.
How important are you custom requirements
compared to what is available through other This route can cost millions of dollars, take a long
options? time to deploy, and presents high user adoption
risk. Therefore, you should ask yourself how
• User adoption risk: What can you do to important your unique requirements are in relation
encourage and incentivize platform adoption? to what else is out there.
Think about push and pull factors, as well as
how the quality of the user experience will
affect usage and adoption.
Outsourced build
• Methodology and know-how: Do you (or The final option is to outsource the build to another
your hired help) have experience building, organization, typically an IT consulting firm. In
deploying, and running data marketplaces? order to maintain their margins, consultancies
What experiences can you/they draw upon to are best served by adapting frameworks and
improve the chances of a successful initiative? solutions developed for other clients to your
project. Whether this solution will be fit for your
Any decision here will involve opportunity cost. purposes depends on whether they’ve developed
Given that you’ll be using time and resources for a similarly-specified data marketplace for someone
this, what other work will you de-prioritize in order else already.
to launch a data marketplace? To what extent
This method can turn out to be the most
do you want your people building software vs.
expensive, slowest, and least likely to meet needs.
creating data products?
In addition, managing a third party and keeping
Let’s look at the relative appeal of the three broad everything in alignment — both during and after a
options: buy, build, and outsource. project — can be challenging and frustrating. Clear
communication and stakeholder management is
absolutely essential.
11
Buy Build Outsource
Cost to build
Cost to maintain
Time to deploy
User adoption risk
Customizability
Methodology
Buy, build, or outsource? Different options will have different impacts on the various factors that need to be
taken into consideration.
Build a strong foundation Do you have a legal framework in place that
clarifies what can be shared, with whom, and
To maximize your chances of success, it’s crucial for what purposes? This is essential for adding
to have a strong foundation. You should be able precision to what data, users, and use cases are
to answer all the following questions before within scope — and the level of risk management
implementing a data marketplace. that will be required.
Do you know who your users are and do you Do you have a plan to encourage and incentivize
understand their use cases? To start out, you’ll user adoption? This is key to overcoming the ‘cold
need a small number of target personas and use start problem’.
cases with high conviction that they will adopt your
data marketplace and get value. Any kind of marketplace is subject to the cold
start problem, a term popularized by Andrew
Do you know what data is available and where Chen, a General Partner at Andreessen Horowitz.
the demand is? This is crucial to understand the The cold start problem refers to the challenges of
full scope of what your marketplace will need to fostering network effects when you have a two-
support and enable. It will also help you to focus sided marketplace. With a data marketplace, you
and prioritize in the early stages. need to balance the needs and numbers of data
producers on one side and data consumers on the
other, even when they’re in the same organization.
Depending on your operating model, you will need
to apply the right strategy to solve this.
12
Launch, grow, and scale your
data marketplace
Once you have your data marketplace capability,
the next step is to launch, grow, and scale. Key to
this process is the idea of continuous monitoring
and measurement. You should check progress
against key milestones at regular intervals to
keep things on track, and adjust course where
necessary based on learnings.
To ensure adoption and overcome the cold start
problem, you’ll need to create network effects
so that your data marketplace is adopted at the
correct pace. To do so, you’ll need to understand
the various incentives that will get data producers
and consumers enthusiastic about getting to know,
use, and get value from the data marketplace.
Another key part of the process is identifying and
then monitoring the primary user journeys on your
data marketplace. These will need to be tested
and improved over time as you identify pain points
for your users. An experienced vendor who has
done this before will be able to talk you through
these processes.
Take the next step
Congratulations! By reading this guide, you
are already ahead of the curve. Internal data
marketplaces are a relatively new concept, that
according to Gartner, will become ubiquitous over
the next few years. They’ve already been adopted
by many organizations for a wide range of use
cases.
The organizations that have taken data access
seriously and invested in marketplace capabilities
are quickly outpacing those that haven’t. They are
bridging data silos, serving a diverse set of users,
improving trust, and eliminating friction. As a result,
they innovate faster, accelerate value realization,
and ultimately get a better return on investment.
You can see data marketplace success stories at
harbrdata.com.
If you’re ready to get started, chat with Harbr today.
Get in touch