This is the third and final instalment of my blog series on Rethinking Master Data Management (MDM) and our approach to it.
So far, I have examined the current state of MDM and highlighted three main reasons for its historical record of failure. In particular, I discussed how MDM solutions tend to promote the idea of a "single view" of an entity, which is often unrealistic and fails to account for the diversity of use cases. I explored the issue of organisations treating data as a first-class citizen, neglecting the bigger picture of what needs to be achieved.
To conclude, I will delve into the third reason for the lack of success in MDM initiatives: the persistence in using outdated methods and inflexible structures for handling data acquisition, manipulation, and distribution by traditional/legacy MDM products. It is time to reevaluate our perspective on MDM products and adopt a more comprehensive approach that goes beyond just record matching and merging.
Traditional techniques & rigid models ought to die
They simply don't work. They didn't work when data was simple, small and structured, what are our arguments to think they will with the data/technology revolution, and evolution we’re living? Tradition MDM products have many shortcomings, and often rely on a lot of manual work, imposition of rigid models, usage of of obsolete techniques and few others shortcomings. Let's go through the top ones in this blog.
1) Rigid Models
Generally speaking, traditional MDM products can be very time-consuming tools for data engineers and data scientists/analysts. I know that because I used a few of them. They are inflexible and rigid. For one, some products continue to impose a fixed data model for data ingestion and servicing. In a world where data is ever-changing and growing, why should anyone be fixated with, say, 15 attributes, specific data structure, naming conventions, data models or even rules?
Data democratisation isn’t about dictating, it’s about providing a flexible platform that fosters intuition and creativity. If we limit and constrain, we're simply impeding creative thinking and growth.
In global research conducted by Quantexa, 40% of business leaders said they faced challenges when building data foundation due to inflexibility of traditional data models, especially for complex data and relationships. ⁽¹⁾
2) Manual Work
Data needs to meet certain quality standards before being processed for MDM purposes, or for any purposeful consumption for that matter. What's unfathomable to me is the amount of repetitive manual work needed over and over for each data source. A drag and drop UI is nice, makes one’s life easier, but doesn’t make things less-manual. In-fact, what it often does is limit you in terms of how far you can get. This argument doesn't only apply to MDM tools, but also ETL. That said, let's not forget the importance of ETL in any MDM project, and that's a huge part of the manual process needed.
You would think that with all these advancements in AI/ML and robotics, such products would finally allow you to focus on non-repetitive tasks. That's still not the case, unfortunately; you are still required to spend a considerable amount of time to understand the issues (by visually inspecting tabular data), cleanse, parse values, classify, standardise, and so on. And that's, statistically, a 50-80% time consuming job for analysts or engineers who could be doing something a tad more creative and productive.
In global research conducted by Quantexa, 42% of c-suite felt increase of business workload due to manual data work, wasting resources. ⁽¹⁾
3) Obsolete Traditional Techniques
Another shortcoming in traditional MDM tools is the way they work to link one record to another, by comparing a set of attributes against another set. Regardless of the methods used, deterministic or probabilistic, the record-to-record and attribute-to-attribute technique is still fundamentally the same, and often tends to miss out on many potential links or give false positives. The reason behind this is due to the fact that such techniques lack context, and deal with data "blindly", without understanding entities context. Wait, you must be thinking what does context has to do with MDM? Everything.
Records matching, broadly speaking, depends on rules, and these rules need to be designed carefully so they don't miss out potential links, or give false positives. With traditional MDM tools, this process is mostly manual and does not account for data locality or context. For instance, a rule which comprises of, say, [First Name + Last Name + Postal Code] that's used to determine if two records should be declared a match or not may not be as efficient as you would think. In Australia, for instance, one postal code could have a population of 100K+ (e.g Toowoomba with Postal code 4350), whilst the nation wide average is ~9K. With a common name like, say "James Smith", it's very possible to have multiple of them in one postal code. Hence, such rule [First Name + Last Name + Postal Code] could be strong or weak, depending on the data available. Sometimes you may have more date elements, other times you won't. And this is where we're likely to miss or give false positives. That was only a simple example, data could be global and not country specific, and that adds an additional set of challenges.
Not all compounds are created equal.
MDM tools, with their out of the box functionality, don't take these signals into consideration. In fact, they are mostly "unaware", and require human input to define rules. And as you can imagine, it's an extremely complicated job to think of all the possibilities. In fact it's impossible to have that done as a human task.
4) Lack of Scalability
What about scalability? Most traditional MDM products don't scale well. Due to the way their algorithms are designed and their clustering techniques, they struggle to handle large amounts of [big] data. I'm not talking about a few hundred thousand records, or even millions, I'm talking about multi-billion record scalability. Funnily enough they would suffer within the multi-million records bracket. I'm not pointing out "batch" processing only, but real-time and on-demand/dynamic entity resolutions. Which, if memory serves me well, no traditional MDM solution out there provides all three modes at the same time.
In an ever changing world, and the massive amount of data being generated every second, scalability and performance are important if you want to stay in the game. Some use-cases, which rely on entity resolution, require real-time response; a proactive approach rather than reactive.
5) Limited View
A golden record, single entity view or even an X 360° view is still limited in terms of visibility. When an MDM solution provides you with that view, whether that's for an individual or a business, you're still constrained with a [flat] view about that individual or business, which is helpful for some (one?) use-cases. But what if that entity is connected to another entity, and there is some sort of relation that can be derived from that connection? A value or some type of intelligence that can be generated?
Examples are limitless to what kind of value that can bring in different use-cases relevant, for instance, to marketing (i.e a campaign to prospects/customers who are connected to existing customers and may share common interests), fighting financial crime, investigations and so on.
To get beyond that 360° view, have wider [720°?] view and even beyond, you will need an additional set of tools, knowledge and skills.
In fact, you will also need a store that allows you to store the relationships (networks), something similar to a Graph DB, multiple of them for each use-case. You need to understand graph theory, and a way to built continuous integration.
Traditional MDM products don't go beyond a single view (that's even when they're properly implemented and succeed). It's similar to having a hindrance that obstructs you from accessing a universe of limitless growth opportunities.
6) Singularity & Architectural Misfit
Traditional MDM products are mostly built with one purpose in mind: MDM, a fancy way of saying they do record match and merge, a functionality you could get from your worksheet app (well, to a certain extent of course). However, there is an important layer that's forgotten about- the serving layer.
With traditional MDM, the "mastered" records are usually stored somewhere, but not necessarily served, or properly served. This leaves the organization with the need to build their own access layer or data integration pipelines. What's more, this master store usually serves one use-case. For each new use case (i.e each new entity view requirement), another store would be needed. That also means another serving layer needs to be implemented (including web services, managing users, masking, attribute level security, etc.), which would result in the need to have another set of components to fulfil these gaps. If, for example, you use a Graph DB to store resolved entities for one use-case, you will need another for new use-cases (assuming entity resolution templates differ).
And finally, we get to that architectural decision, and the need to keep up with the latest and greatest: the likes of Data Fabric or Data Mesh (at the time of writing this blog at least). Without going into the details or motivations for each of those architectures/frameworks, one thing I can note is that traditional MDM products fit poorly in their vision or require many other components to compensate for their gaps, or the bad practices they impose. It’s like trying to fit a square peg into a round hole. The two simply don't match, and no amount of force will make them work together seamlessly, and lots of adjustments will be needed, until eventually you decide to use a "round peg".
The reason why I'm arguing this because most MDM solutions failed to keep up with the industry trends, and decided to stay in their own bubbled black box, wrapped (in some cases) with a fancier UI, without doing fundamental changes to what's under the hood. Having an MDM tool connecting to "Hive" store to collect "structured" data, for instance, doesn't make it big data ready, or deserving to be included in a modern data architecture, or be even included in some analyst quadrant because they "tick" that box.
So, is MDM ought to die?
Absolutely not. But maybe our old perspective of what it is about and how it works, should.
What we need here is to shift our thinking from “single entity view” to a dynamic “contextual entity view”. From a 360° view to a 720° view. From static data stores to natively enabled and integrated data platforms that are optimised for serving. From rigid, old methods and techniques to more open, flexible and non-restrictive ones.
There has been tremendous advancement in data and technology since the first MDM product came out some 25+ years ago, and organisations that want to be data driven should make use of that. We should focus more on creative tasks, and let machines worry about repetitive ones. We should use the power of ML and graph theory to our advantage to help us challenge the status quo, generate better outcomes, see hidden insights and ultimately gain more value.
I firmly believe that Master Data Management (MDM) needs to progress and mature into a comprehensive platform that tackles the shortcomings mentioned. You shouldn't select multiple components or create multiple stores to satisfy your vision or strategy. A solution should be carefully picked, one that offers end-to-end approach and eliminates the challenges of decision-making across various use cases. The fundamentals of "data mastering" are critical for nearly all applications in every organisation, and it's crucial that you choose a holistic and complete solution that supports your organisation's strategy. I foresee that as a Contextual Decision Intelligence platform, a solution that eliminates the need for separate components to address individual challenges or use-cases, one that provides a unified platform for the whole of organisation.
This concludes my 4 parts blog series. I hope you found the content and arguments meaningful. Interested in hearing more, sharing feedback, or asking questions about MDM? Why not join our Data Management Specialist User Group?
References
(1) Data in Context: Closing the Data Decision Gap [Global Research]