Businesses are increasingly reliant on new technologies to enable the use of data in new and innovative ways. However, most data leaders are finding that technology alone does not cause the organization to deliver new and valuable insights fast enough.
Fundamentally, we need an approach that holistically supports the infrastructure, technology, and processes to convert raw data into something valuable and accessible. This approach should improve the full data lifecycle, including knowing what data exists, where it is, how to access it, and how to understand it. But most importantly, it should focus on the building of value with that data.
In this article, let’s explore how “Data as a Product” thinking can deliver better value from data to end users in more useful and approachable ways. The product management concepts that have revolutionized consumer products in recent decades informs this paradigm. In short, experience is showing that businesses that adopt it along with the idea of “data products” become more data-driven, achieve better business outcomes, and become more agile.
Table of Contents
What Are Data Products?
First, let’s start with data products, and do some basic research. Industry leaders have tried to define this concept before, and each has added nuances of their own. Here are some examples:
The former United States Chief Data Scientist, DJ Patil, defined them in his book Data Jujitsu: The Art of Turning Data into Product as “a product that facilitates an end goal through the use of data”.
Tableau, a well-known business intelligence tool vendor, says, “a data product is an application or tool that uses data to help businesses improve their decisions and processes.”
Sanjeev Mohan at Forbes understands them as “a self-contained data “container” that directly solves a business problem or is monetized.”
And McKinsey explains that “a data product delivers a high-quality, ready-to-use set of data that people across an organization can easily access and apply to different business challenges.”
Interpolating between these views and talking to our customers, we have arrived at a useful definition that lends itself to action. Here is how we think about them:
Data products are data assets specifically created to help businesses and consumers make better decisions, improve processes, gain insight, and generate value. They can be managed similarly to consumer products. They can be shared as internal building blocks, but also delivered externally. At its essence, a data product unlocks the potential of data in ways that provide quantifiable economic benefits to both external and internal customers.
With this kind of product mindset, you can view data not as a set of columns and rows, but as the foundation for value delivery. Consequently, data teams can focus on business value, and holistically design, build, and curate data into usable and consumable artifacts.
“Data as a product is very different from data as an asset. What do you do with an asset? You collect and hoard it. With a product, it’s the other way around. You share it and make the experience of that data more delightful.”
Zhamak Dehghani, Data strategies to drive business value at scale
Anatomy of a Data Product
So what do they look like? The physical data product takes the form of some type of data set. Some examples are:
- A CSV file that contains customer orders.
- A parquet partition that contains an hour of website clickstream data.
- A database table that contains a product catalog.
This physical instantiation is necessary, but insufficient to be useful. Let’s look at an example to bring the gaps to life.
Data Products in Real Life
Let’s say an airline has seats available between New York and San Francisco. The airline wants internal customer-facing teams to fill those seats, as well as an entire ecosystem of booking sites such as Kayak, Skyscanner, and Booking.com. The airline operations data team can be the producer of a physical data set that contains a listing of all the empty seats, but can anyone use it?
Alternatively, the booking sites need listings of empty seats, not just from this one airline, but hundreds of airlines that can carry their customers around the planet. As potential consumers of these data sets, they have specific needs that are unlikely to occur to each of the many airline operations data teams. Can they economically collect, adapt, and augment data from all these airlines to compose timely and reliable itineraries?
The Role of Contracts
The concept of contracts to define data products is instrumental to closing this massive gap in our example. They are agreements that describe the details of data products that move businesses forward. Let’s take a look at how this works.
The example highlighted two key roles: producers and consumers. Producers can’t just take a data set and dump it out in some physical file, and expect consumers know how to use it to create business value.
The contract provides the context for producers to take on responsibility for the data products they create, and be confident that their products will deliver value for the business.
Some elements that good contracts can cover include:
- What physical format does each data set take?
- What technology is used to access them?
- How often are they updated, and what signal is used to let data consumers know?
- What technology is used to secure them?
- What data structure is in each data set?
- Does money change hands for using the product (pay or get paid)?
Contracts allow stakeholders to converge on doing business much more efficiently. Producers can measure the quality of the product they create, and provide guarantees about the accuracy, timeliness, and reliability of the data products that the consumers have agreed to.
Treating Data as a Product
Now that we have a handle on what a data product consists of, let’s look at the implications for their production, handling, and use in the business.
The parallel to consumer products is instructive here. On one hand, a consumer product like an airline flight describes attributes like origination and destination, scheduled departure and arrival, seat number and location, baggage allowance, costs, and included meals. On the other hand are all the operational aspects to make that product happen, including pilot schedules, terminal slots, gate staffing, airline IT systems, airplane model used, and snack carts.
By shifting focus on how we treat data as a product, we can look at what it takes to:
- Produce data products that fulfill their contract,
- Consume data products to create business value
Production
Treating data as a product includes how to create and sustain data products. The use of good practices impacts the way they are used, who supports them, and the governance needed to assure they fulfill their contracts.
When we treat data as a product, we consider the entire production lifecycle, including data pipelines, infrastructure, security, compliance, and other tooling. This holistic approach considers how all resources align with both internal and external stakeholders in the process. To maximize the benefits, organizations should:
- Adopt data management practices with reusability and adaptability in mind
- Adopt automation of common data production activity to raise process reliability
- Measure and manage actionable data quality across the lifecycle
- Enable secure, self-governed access in alignment with the product contract
- Maintain catalogs of published products for consumption and reuse
- Maintain detailed documentation including lineage, processing logic, data types and business terms
- Measure the cost of resources used in the production of each product
Consumption
Treating data as a product includes understanding how to consume data products. Common examples include:
- Internal teams, possibly in a mesh or fabric architecture, can combine different data products. For example, a customer preference engine can merge internal domain data into segmentation data to enable rapid insights and improve customer satisfaction.
- Business dashboards provide easy-to-digest visuals and analytics. For instance, combining daily revenue of consumer products with costs of production to track profitability and ROI. Each widget on a dashboard can display a specific data product.
- AI or ML models consume data products, and in turn, automate systems or assist decision-making. The model results themselves can be treated as products as well, before being used by automation systems.
A data-as-a-product mindset organizes the data life cycle along the lines of a production process. It creates visibility into the resources these processes use, and the reliability, security, and accessibility of these processes. An end-to-end production workflow along this life cycle looks like this:
A note on the role of APIs:
APIs have evolved to be the most common method to deliver data products, making them a natural point to test for and enforce their contracts. APIs cleanly decouple the production processes from the consumption processes, and their distinct technology stacks. The transformational value of APIs in the digitization of business ecosystems has been documented extensively elsewhere.
The Difference Between Data Products and Data as a Product
We’ve talked about data products and treating data as a product, but what is the difference?
Data products are deliverables, while data as a product is the framework and mindset we use to build them. Data as a product is the product management approach for delivering value through data.
Shifting the organizational mindset to treat data as a product unlocks the full potential of data for the business. Treating data as a product and using contracts is an intentional approach that helps data teams efficiently produce valuable data products, and easily answer questions like:
- What data exists and where?
- Who needs this data and in what form?
- Where is this data coming from and where is it flowing to?
- How and where did my data change?
- How is this data being used by the consumer?
How the Industry Evolved
The data as a product mindset didn’t happen overnight. It has evolved over the years as data thought leaders have tackled problems like big data, data lakes, accessibility, and other modern data challenges. Therefore, reviewing some of these challenges might help us understand more about this mindset.
The Emergence of the Database
The advent of the relational database system brought us fast and flexible access to our data. This standard paved the way to support more complexity and larger scale in managing data. Over time, the technology has pushed innovations with NoSQL, warehouse engines, big data frameworks, and infrastructure innovation. At the end, technology evolved to suit bigger and more complex data needs by decoupling infrastructure from product delivery using abstraction and interfaces.
Evolution of Technology to Support Big Data
Data needs continued to expand, forcing database technology to mature. Infrastructure abstractions added the support needed to scale into what we now call data lakes. These tools allow us to efficiently store tremendous amounts of information. Specifically, a large part of building and supporting data products has been around tools and processes that help extract, load, transform, and organize data for consumption. However, building from the infrastructure point of view created friction for users wanting to consume the data.
Siloed Data and the Problem of Accessibility
As a data consumer, knowing which data is available for use is probably the first question that comes to mind. Unfortunately, most companies have siloed infrastructure and teams with specialized skills to work with the data. This requires coordination and time. Hence, a data scientist who wants to analyze known data could spend weeks or months tracking down data. Then operational business teams get backed up with ad-hoc inquiries to validate data sets for them. A multitude of disparate sources further exacerbates data access challenges. While designing for accessibility is a core component of the data as a product mindset, the organization needs to tackle the culture comprehensively in order to gain the benefits that data products can offer.
Data as a Product Mindset
Data as a Product is the next evolution of data management. Treating data as rows and columns or service endpoints is useful but can’t meet today’s needs. The pressure to deliver data at a faster pace requires a new method. Thankfully, a product mindset gives organizations the freedom to approach data as a holistic and readily consumable deliverable.
Data as a Product Is About Customer Delight
Similar to consumer products, data products will delight customers if they are accessible and consumable in easy and frictionless ways. Internal customers can more easily access internal products to produce larger, more valuable customer-facing offerings. External customers will be delighted with the ways that your data is easy to understand and consume. The data as a product mindset moves us beyond viewing data as rows and columns to a holistic product approach that can serve both internal and external data consumers.