In the first two articles in this series, I compared some different approaches to organizational data design. Part I spelled out the difference between top-down (Inmon) and bottom-up (Kimball) methods of data warehousing. Part II talks about alternative approaches with data vaults and data lakes. All of this concerns how to best build a centralized repository of data, which starts with the assumption that you want a centralized repository of data.

Should you want one, though?

I mentioned the data vault approach previously, and favoured its approach to reconciling inconsistencies in data. To be clear, that’s the approach of not trying to reconcile inconsistencies in data. I like that for practical reasons, but one I like even more is the data mesh.

A data mesh is a less centralized approach to the ownership of data. No central team owns the data in a mesh. Instead, every domain is responsible for owning its own data and making it available. Advertising teams own and share data about advertising. Email marketers own and share the data they have about email. Finance owns and shares its internally-shareable records. Fulfillment teams own and share their data about order fulfillment. And so on, and so on. Each team does this by making data products available in a self-serve manner: reports, query interfaces, APIs, whatever.

Every one of these examples includes some data about the customers of the business. They’re all facts (or claims, at least) about the same people. In the context of any one of these domains, teams there can operate from their domain-specific data and incorporate context as necessary from the other teams by using their data products.

Aufbau und Syntax – 2020 – Canon EOS 6D

Philosophers of data

This doesn’t remove the need for central data teams, or for data governance. Indeed, there are regulatory and security requirements that leave businesses with a great responsibility when it comes to the stewardship of data. What is different, however, is the role they play. Instead of ownership of the data or defining dogmatic truths, it’s to empower domain teams with the capabilities they need to have their data without tripping over it.

Data teams that serve at the level of an entire organization are like philosophers of science: they’re less concerned with the content of any domain they study, but whether the activities within those domains are done in a way that makes sense.

So these philosophers of data in an organization are there to ensure that data products are being surfaced in ways that make them practically accessible, to meet the business needs from other domains, but also properly constrained, to meet requirements of security and privacy, especially when doing so is mandated by regulatory bodies with the power to make failure a very real liability.

Moon Through the Palm – 2024 – Fujifilm X-T5

Someone has to think “big picture”

Another role of the central data team is to help executive stakeholders see the big picture. We spoke about all of the individual domains and their data, but the leadership of an organization still needs to understand the business holistically.

For some, this trumps the immediate practical needs of domain teams. They may argue that we need the consolidated data anyways, so we might as well design that data warehouse.

I disagree. This still suffers from the same practical problems, and I’ve never seen a central data team quite pull this off. Even really good ones end up with gaps in the information, or have trouble getting executives to fully buy in to what they’ve built.

Instead, I think the energy is better spent leaving the data where it lies, and producing mechanisms that leverage the domain-specific data products and make the “big picture” view as accessible and clear as possible. That view isn’t typically a huge data set: it’s a highly aggregated view of data that offers decision support.

This is the subset of the data activities in the organization that truly should take a top-down approach, because in that case more than any, it’s the view from the top that practically matters.

Starting Lineup – 2022 – Fujifilm X-Pro3

It usually just happens

The concept of a “data mesh” with that name is fairly new, but the structure of information in a data mesh is the typical architecture I’ve seen emerge organically in organizations for most of my career. What is missing in almost all cases is a data team that understands their role in this way. Instead, there is almost always someone who wants to take ownership of the data, to consolidate it into a data warehouse, or a data lake, or a data vault, or some other thing that puts all of the data in the same place.

That instinct isn’t wrong, and indeed a data mesh can frequently include data from different domains that comes together in the same physical or conceptual locations. When this should happen, though, is only when it makes sense practically, and when it lowers the costs of maintaining that data. That won’t always be the case. It’s time to let go of control over where the data lives and what the truth it tells is, and embrace the fact that it’s hopelessly messy.

Direction – 2020 – Lomography Color Negative 800, 35mm film – Olympus OM-1n

There’s a higher calling

So, by instead ensuring that the sharing of data can happen in consistent ways goverened by the requirements of the business, resources are freed up. With them, central data teams can let that urge to consolidate data loose on that big-picture view. There, they will need to make choices about how to aggregate that data, how to present it coherently and honestly, and how it becomes accessible to leadership. But this won’t be done at the incredibly tedious level of individual records, but by comparing the aggregated output of all the domains and combining a synthetic view of that data with analysis—that is, storytelling with the data—that makes the critical information that drives the business at the highest levels as accessible and user-friendly as it can be.

That is worth the effort.


Cover Photograph: Foamy – 2024 – Fujifilm X-T5

I’m Head of Product at Napkyn, a provider of digital analytics and media solutions. I’m also a father and a photographer, and I have a background in philosophy. This is my site.