David Davis: 00:00:09
Hello, and welcome to the Dataverse.ai video podcast powered by ActualTech Media. Thanks so much for joining us before we get started. There’s just a few things that you should know about today’s event. We’ve got a great list of topics, so here’s what you’ll learn. We’re going to kick it off by talking about the definitions of data . You’ll learn about the business benefits of modern data architectures, the technology benefits of modern data architectures and how to make a business case for data architectures based on business-driven benefits and use cases. We’ve got a lot of great topics here to cover. Before we get started, though, I guess I should mention my name is David Davis of ActualTech Media, and I’ll be serving as your host for the podcast today. I’m a co-founder here at ActualTech Media and I’m excited to have two distinguished data experts on the event today, they’re data architects and strategy experts. Welcome to Philip Russom and David Loshin. Little bit of background on Philip and David before I hand off to them. Philip has 25 years of experience in the IT industry as an analyst researching user best practices, vendor products, and market trends in data management and analytics. He’s worked at most of the world’s leading it analyst firms, including Gartner and Forrester Research. And he’s now a semi-retired independent industry analyst who applies his long experience with data and analytics to successfully design and execution of modern data management use cases. He’s a speaker, he’s a consultant all around expert. And so we’re really excited to have Philip on the event today. We’re also excited to welcome Mr. David Loshin, who is the president of Knowledge Integrity, and he’s also a senior lecturer at the college of information studies at the University of Maryland. Gentlemen, welcome. It’s great to have you on the podcast today. I’m really looking forward to this discussion. I wanna learn a lot and I know the audience does as well about the case for data architecture. So David I’ll first hand it off to you.
David Loshin: 00:02:35
Thanks, David. Thanks so much for inviting me to chat on the podcast, thanks to Dataverse.ai and very happy to be reunited with my friend and colleague Philip to chat about data management and data architecture. We always have interesting banter. So I’m gonna start off by talking a little bit about business drivers for data architecture. And Philip, why don’t we go to the next slide? Why do we even care about this? The first reason is that people are realizing that organizations are instead of focusing solely on applications and systems that are operationalizing transaction processing, they’re realizing the value of the underlying information that can be derived from the data that is processed through those different systems. And so data architecture is increasingly important because we’re looking at data as an asset, migrating the data to the cloud, looking at analytics and how that’s deployed in the organization, looking at visibility of data in different data domains that cross different system boundaries or different line of business boundaries, or in some cases, even different organizational boundaries and looking at how sharing and adoption of data sharing becomes a critical component.
David Loshin: 00:03:45
Our next piece is about the data-driven use cases, and they demand a lot of data and being able to exploit that, but do that in a way that addresses the needs of the different communities of data consumers. It is becoming much more complex as the volumes of data start to grow, and you need a well designed data architecture to establish a mechanism for a storing, managing, persisting, archiving, and accessing that data. Next point would be that there’s organizations are not really fully aware of the breadth of ways that data can be managed twisted about, prepared, engineered and, made available. And so we need a new vocabulary for these data architectures and to be able to, to raise awareness and train our different data consumers and data engineers to be harmonized in how they talk about data architecture.
David Loshin: 00:04:42
And finally this podcast, we’re gonna look at that vocabulary. We’re gonna talk about what what are the drivers for data architecture as well as what are the words and what are the terms that we use? So let’s move to the business motivations. And, and this is actually one of the byproducts. I think that came clear during the pandemic is that organizations that made best use of their information resources were able to quickly pivot and still maintain viability during stressful times. And so let’s kind of walk through some of these things, omnichannel customer experience as customers are interacting with businesses through different manners, they’re starting to, to work with them. Philip, why don’t you give us a click there. Yeah. be able to communicate with organizations through a number of different channels through apps, on their phone, through websites making telephone calls driving to a brick and mortar location, interacting through third parties for an organization to be able to have good analytics.
David Loshin: 00:05:48
They have to have visibility into not just a profile of the customer, but, but accurate and up to date information about every way that that customer interacts with the business. And so omnichannel customer experience is driven by information coming from a, a variety of different systems. Our next point is that organizations are communicating with each other a lot a lot more. And if you think about it even the embeddedness of these partner channel networks are, are not even aware. Most people are not even aware of them because it’s all embedded within the interoperability network. So when you’re operating on an app and you wanna make a payment through Apple Pay or through some third party you don’t even realize that that there’s multiple partners who are interoperating with sharing data across those different networks.
David Loshin: 00:06:38
And if you look at any kind of environment where there’s multiple partners, there’s a large amount of data that is being shared, exchanged sometimes protected sometimes forwarded in encrypted manner and have to be able to understand how the data architecture supports that. And, and of course, if you look at that from a different perspective, we’re gonna move to our next point, which is that a lot of organizations are, are facing threats from emerging new businesses that are that are, you know, for example online only, or technology driven or data driven organization, as opposed to, you know, companies or financial institutions or manufacturing businesses that have been around for dozens or scores or hundreds of years there, those organizations are well established in their monolithic systems while these newer businesses are thinking about new ways to use technology, to drive how they, they compete.
David Loshin: 00:07:35
And so organizations have to look at their legacy systems and understand how to modernize those legacy systems into a modern data architecture. Next bullet point is that, is that the, it’s interesting, Phillip I just got emails from, from three or four different banks that I do business with and credit card companies, all talking about all these different types of fraud attempts that, that people are attempting to to do. I get emails every day the, the volume of attempts at, at defrauding people and the, the insidious ways that people are doing, it becomes much more complicated to be able to identify and not just try to find it after the fact, but actually to reveal it as it is happening. And so if you can modernize your data architecture, that gives you decreased resistance to fraud at the same time there’s an increase necessity to institute some auditable mechanisms for complying with laws.
David Loshin: 00:08:33
And why don’t we move to the next the next image here laws especially data privacy laws that are, that are being enacted, not just in the United States, in different states, but over a hundred countries have data privacy laws. And that’s just talking about protection of, of individual privacy, but there’s all sorts of, of compliance requirements in every single industry that require more visibility into how information flows throughout the organization. Again, motivating the need for, for data architecture sharing information through what’s emerging in the European Union and open data regulations is and Philip let’s go to the, that next item is, is a mandate for financial institutions, for example, to share data, to facilitate improved customer experiences by sharing data across those different organizational boundaries. And that becomes a, a new opportunity for instituting data architecture.
David Loshin: 00:09:30
But while you’re doing all these things let’s go to the next bullet point is that your system performance needs to, to maintain or exceed that level that the older systems did. And so if you don’t have a good data architecture, you’ll be introducing bottlenecks and things that slow down interaction. So you need to be aware of, of how your data architecture is layered to be able to provide that level of performance. And then lastly organizations often are, are well established in terms of the way they like to do things the way they like to do things. And so there’s very often let’s move to that next item resistance to change within the organization. And so a new data architecture not only provides you with a means for revising the way that information is managed throughout the end to end cycles in the organization, but also gives you a language or a vocabulary, as we were talking about before to discuss how a new data architecture or modernized data architecture improves, not just the way the business runs, but customer experience, as well as staff or employee experience.
David Loshin: 00:10:40
So let’s move to that. Next slide. So what is data architecture? That’s a good question. Next slide would give us a little bit of that vocabulary.
Philip Russom: 00:10:53
David, I think this is where I get to step in. Hey folks, this is Philip Russom. So let me talk about you know, what data architecture is, what it does, what it does not do that sort of thing. I think a great way to introduce data architecture, and I’ve used this with a lot of my clients. They seem to react well to it is to start by talking about the four requirements for data architecture. Now, if you look on the graphic, I’m, I’m projecting here. If you look in the upper left hand corner, you see the data itself, and don’t forget data architecture has many, many things built into it, but really at the heart of it is the data. And we expect a data architecture to be populated with data. And it’s not just the data.
Philip Russom: 00:11:34
It’s the fact that you have many data sets within a larger data architecture and there will be relationships among these data sets. Some of them there’s, sometimes there’s data shared between two different data sets. You know, a lot of customer data is distributed through different diverse data sets. And if you want the full view of customer, you have to be able, able to address all those data sets to get all the customer information. So it’s not, it’s, it’s the data architecture’s primarily about the data itself, but also relationships. It’s also about data semantics and data semantics takes many forms, but the most common would be technical metadata. So technical metadata is a way of describing data in terms of its technical characteristics. And if you wanna get, if you wanna manage data in a data architecture, if you wanna be able to find data in a large, you know, highly diverse data architecture, you have to have very careful descriptions of data through technical metadata.
Philip Russom: 00:12:26
For some applications, you need more business metadata, and there are new ways to describe data like the data catalog and data lineage, or two new forms there, but you get the idea. The first requirement is data, and data’s gonna be complex with highly diverse data, semantics and relationships across data sets. Now number two, if we go over to the upper right hand corner, you’ll see data platforms and tools, and yes, absolutely. You need software to help you manage data for a data architecture, you’ll need data integration tools to get the data in data, quality tools, and so forth to improve the data. You also need databases. So database management systems that kind of software to help you manage data in storage. But, and you know, I, I do find sometimes I see people get a little confused and they’ll tell me, oh yeah, I’ve got a data architecture, it’s Amazon web services.
Philip Russom: 00:13:22
And I have to tell ’em, well, that’s a platform for your architecture, but it’s not the full architecture. David, you and I have worked in data warehousing a lot. So a lot of times people tell me, oh, I’ve got a data warehouse it’s on Oracle. And I have to tell ’em, well, Oracle or some other database management system is very important to managing the data, but it’s actually, you know, but that platform is not actually the, not actually your warehouse. So it’s a similar thing. Sometimes people get a little confused and they think their software portfolio is their architecture, but, but that’s just a piece. That’s just a piece of the architecture. Let’s go around the wheels more, let’s go to the right side. But in the lower corner, you see the word interoperability, see, you can have a lot of data platforms and tools but if those tools do not interoperate really, you’ve just got a bunch of siloed data.
Philip Russom: 00:14:10
You’ve got a bunch of tools that are very hard to coordinate. And so part of the secret sauce of unifying a data architecture is by having tools that can interoperate with each other, a classic would be your data integration and data quality tools have to work together closely because they’re typically working on the same data at the same time. And also you can’t get to first base without metadata, right? So so interoperability has to be a big part of the architecture. Otherwise it’s not a unified architecture, it’s just a bunch of silos. And then finally lower left, you’ll see technical processes while you need like the data integration processes. I mentioned earlier just to get data into the architecture same that that also will repurpose data within the architecture and then move architecture to where the data gets consumed, maybe through reports or analytics, or gets embedded in operational applications. So there are all these all kinds of processes, both technical processes, also people, processes, development, processes that work their way across the architecture, and those tend to stitch together the architecture and help to unify it as well. So you get the, you get the idea, there’s a lot in a data architecture, but they don’t, but they all boil down to four requirements, four different areas that are typical of all data architectures.
Philip Russom: 00:15:31
Now, as a compliment to what I just said, let me tell you what a data architecture is not. I think I, I kind of made this point on the last slide, which is a cloud, a database management system, or some other platform. These are required to manage data, but the architecture is mostly about the data. So be careful you don’t confuse the platforms for the actual architecture platforms are just you know some pieces of the architecture. Likewise, if you’re thinking in terms of software portfolio and with a lot of the stuff that you would do for data integration, metadata, and quality, which I mentioned that actually ends up being a very long list of tool types, maybe for multiple vendors open source, maybe you’ve built some of your own, so software portfolio, it’s part of the architecture, but that’s not all of the architecture.
Philip Russom: 00:16:17
You know, we have this new term, the modern data stack, and it it’s a useful, it’s a useful term. It really is, but it’s really a vertical stack representing a software portfolio that incorporates many of the newer tools and practices we’re using in data and analytics and you know the development of data products, et cetera. And you know, this is important. Datastack is a great way to describe this stuff, but really it’s just looking at one piece of the architecture, typically in a vertical aspect where it’s really a true data architecture is gonna include all the above, but it’s also gonna be horizontal due to the mini cross platform processes. I talked about on the previous slide which are both business and technology processes. And then finally, some people do think of their architecture as an uncontrolled data collection and access point.
Philip Russom: 00:17:07
And that’s what you, that’s, that’s what you want. Your that’s what you wanna protect your architecture from. For example you know, the data lakes, a new thing, sometimes data lakes are not treated well. They’re not curated, they’re not governed. And they do become the dreaded data swamp. And that’s where you’ve got a lot of data. It’s just, it’s just, it was not curated well, so you have a lot of redundant data. It was not governed. So you don’t have adequate metadata and other descriptions of data for it. So, so your, your architecture should not be an uncontrolled collection for data. It should be one that is very carefully designed and curated. And as you design and curate your data architecture, you need to be sure it aligns with appropriate business business goals. Now, David, why don’t you step in, I think you’ve got some other points to make here about a modernized data architecture.
David Loshin: 00:17:55
Yeah. You know, let me give kind of an example of, you know, if you, if you looked at this from, from the cloud perspective and, and a lot of organizations are moving to the cloud, I mean, the first thing that you have to look at when you’re talking about modern data architecture, and if you gimme a click here, I’ll show you are who are the data consumers, and we’ve got different communities of data, consumers that have a variety of different level of skills when it comes to data, access, data manipulation. You know, sometimes you’ve got business analysts who are expecting to look at dashboards or scorecards. You’ve got data analysts who are, who are, you know, maybe you got their fingers deep into the data. You’ve got business data consumers who are, who are looking at maybe at just visualizations or, or data stories that have been produced to convey some, some messaging.
David Loshin: 00:18:37
But sometimes you’ve got data scientists who are, who are much more sophisticated when it comes to applying analytics tools or, or coding, or, or new types of machine learning and, and AI modeling, et cetera. So they’ve got a variety of different types of, of levels of skills, and they’re expecting to be able to get access to, to some set of data. So we give another click, I think that’ll show a wide variety of different types of of data sources fill up whenever you get a chance to kind of bring that in there. I’ll talk to it. Anyway, I mean, we’ve got, there we go. Thank you so much. We’ve got traditional databases, relational database management systems. We’ve even got data warehouses and data marts but we’ve got, you know, an increasing number of, of origination points for data that’s streaming into an organization.
David Loshin: 00:19:31
And that includes semi-structured data that’s coming from, from different types of, of systems or, or social media networks. We’ve got ERP systems. We’ve got SaaS platforms, marketing systems. A lot of data’s being pulled in via APIs from external sources. We’ve got you know, that on the bottom, right there, IoT internet of things a, a growing number of, of either machines that are generating sensors that are generating streams of data or other types of data feeds like the weather or traffic or news, or those types of, of sources. And so these end consumers on the right hand side, wanna be able to get access to that data coming in from the left hand side and what we need. There is some kind of mechanism that is an architecture. So gimme a click there.
David Loshin: 00:20:22
Here’s an example of an architectural kind of high level architectural layout which Philip I’m sure you’re gonna drill into in a little bit in greater detail, but in this environment, there has to be some mechanism for data ingestion and data integration. We’ve gotta have some, some implementations of architectures for managing persisting and allowing accessibility so that there might be a cloud data warehouse or a data. Like we need to have some mechanism for, for raising data awareness among those, those end user users. So they can find and use the, the data assets that are most appropriate for the type of, of business use case that they’ve got. And we’ve gotta have some kind of data fabric that provides pipelines and facilitation of data engineering and data preparation, so that data can be fed in the right way and managed in the right way into those end user tools that those data consumer, members of those different data consumer communities will wanna be able to use. And just as you had in your prior slide, right in the middle, you had a a little graphic that said governance to be able to ensure that we’ve got compliance with all the, the laws. And we’ve got protection of sensitive data – click one more time. We’ve got governance that needs to be instituted and embedded directly into the data architecture, the same way analytics and data ingestion and data integration are so Philip, can you drill down a little bit into some of these?
Philip Russom: 00:21:53
Yeah, yeah. I like your drawing. I also like the fact that David Loshin just pointed out to us that cloud is a turning out to be a preferred platform for these newer data architectures and even organizations that have well developed architectures on-premises are migrating to cloud now. And it’s because of cloud’s a good choice for the large data volumes that these architectures tend to tend to build. Also clouds have reached maturity so that they can support many different use cases in an architecture like this will support many use cases from analytics through operations, through monitoring life cycle management, for data, with data archiving, you name it. It’s a very long list of things that this modern architecture can do, and, and cloud’s a good place for it. So folks, let me, I’m gonna draw a picture.
Philip Russom: 00:22:44
I’m just gonna draw it quickly. And this is a kind of reference architecture, and you’ll see, it’s like most reference architectures. It’s simple, you know, you don’t try to put all the details into a reference architecture, but I just wanted to draw the architecture a little different from David and a lot of that’s because I did literally hundreds of user inquiry calls when I worked at Gartner. So I’ve talked to literally hundreds of Gartner clients and the way a lot of ’em draw their reference architecture is using their favorite drawing tools. I think probably strikes home for a lot of you, for example in this reference architecture, I’d like to start in the middle with the data fabric and the data fabric is really it’s a collection of two things.
Philip Russom: 00:23:31
It’s data pipelines including old fashioned data integration, ETL, ELT, as well as newer things like the pipeline itself. These pipelines often interact. Remember I talked about interoperability. So these integration tools interact with data, quality tools, change data capture, master data management tools. And of course, a lot of companies are trying to get data that’s coming to them in real time through streams and so forth. So data pipelining would be one layer of the fabric, the other layers, the data semantics, I described this earlier, right? So semantics it’s most typically technical metadata, maybe other forms of metadata, but it could be data virtualization, data cataloging and so forth. So the two of those go together into the data fabric and this is how inbound data enters the architecture and also how it exits the architecture.
Philip Russom: 00:24:23
The data fabric also passes data to storage, right? So in modern data architectures that I’m seeing. The data lake has become a common feature. We still have data warehouses in the lake and the warehouse, they do slightly different things. So they’re complimentary, one does not replace the other. And some people kind of slam them together because they do have some overlap to create the so-called “lake house.” So modern architectures, they’ve got the, one of the new features is the data lake or the lake house. And then this particular architecture is not just data. It’s also the analytics layer right now. That’s something you always have to ask yourself if you’re designing a data architecture, what’s the scope of it? And some people will stick to just the data pieces. This one goes beyond the scope of data pieces into the end user tools that you would see for advanced analytics and business intelligence.
Philip Russom: 00:25:22
So, so I just wanted to share this this picture here. It gives you an idea of how most people will illustrate their their data architectures later. I’ll talk about how you knew you do need to get to the point where you’re drawing the proposed data architecture, just so you can communicate people with it. So some sort of drawing like this, or whatever fits your style at your organization. It’s an inevitable part of planning and modernizing a data architecture. Also from this, you can see that like a lot of architectures, this one has layers, doesn’t it? Right? So the three broad general layers would be in the order I presented them the data fabric storage and end user tools, but there, there could also be other layers like for some people they’ll have the they’ll have, what’s often called the speed layer. They’ll have a specialized layer for real time data and that sort of thing, but you get the idea. So this is data and analytics and it’s got multiple layers to it.
David Loshin: 00:26:25
Hey Phil, can you bounce back to that? That slide, I just wanna point out one of the things that you’ve captured here and I think is actually very challenging in many cases to capture is, is the, the interoperability across these different, these different layers. And that’s, that’s, you know, one of the things that that becomes complex is, is making sure that those end user tools have a semantically consistent view into that data fabric. And since, you know, you were, you were, you know focusing on data semantics. And I think that’s really important because you’ve got, you know, all these different sources are coming in and all these different representations, but as far as the consumers are concerned, the data consumers are concerned. They don’t want to have to know the difference between, you know, a customer number and one data set and a customer ID and another data set, and a, and a a customer ID and a third data set. They just want to know, you know, what’s the customer number, you know, as far as they’re concerned, there is one, one data element. And I think, you know, what you, what you’ve done here is really good and that you’ve got that kind of captured inside that data fabric layer. It’s, you know, it’s really important. And, and in fact, that does have influence on how you actually engineer your, your, your data architecture when you, when you actually get down to that next level.
Philip Russom: 00:27:35
I think you’re making some good points there. And you know, the, the way people used to draw data architectures is they’d kind of ignore metadata and just assume, well, it’s everywhere and you can’t do anything without it. So we’ll just assume it, we won’t even put it in our drawing nowadays. We understand that metadata, even though it’s ancient it’s, it was with us at, in the origins of the database management system, et cetera, right. Metadata is more it’s more relevant than ever because with the very large scale data architectures, most companies are moving towards you need metadata just to keep a record of what’s there. And of course, if you’re gonna interface, most interfaces depend on metadata. So metadata is there for not just to document data, but also to be an interface layer which is part of the interoperability of, of this thing. So anyway, yeah, one of the things I tried to play up here is that semantics has, has come into its own, right. And it’s not just metadata, it’s catalog lineage graph data maps, et cetera,
David Loshin: 00:28:33
That that definitely feeds into what you, you know, that slide that you just, that you just flash where I ask you to go back, but you know, about the benefits of the data architecture from the, from the, the business perspective. So yeah, if you wanna move to that, that next next slide, and maybe jump through that too, you know, here here’s the big issue, which is, you know, we, we haven’t traditionally really thought about, about the challenges associated with, with data distribution and how that impacts the way people do their jobs. And, and in fact that really, you know, characterizes what the problem is, which is, which is the more we look at at high volumes of data coming from multiple places this distributed data inhibits the ability to put together applications or, or processes, business processes supporting high value business activities that are relying on that data coming from those different sources.
David Loshin: 00:29:21
And so if you’ve got a data architecture, the way that you started mapping it out in that, in that prior slide, that unifies that diverse data through that semantic perspective, then you are able to derive a bunch of these benefits that that we’ve got on, on the slide. So for example you know, supporting the data, consumers, being able to, to tell them what are the data sets that are available that are best fitting to their specific needs. So you get this visibility across those different data sets, but you also get the ability for, for there to be an inventory of the available data sets and what’s inside them. And what’s the characteristics of the, the quality. So, so you get centralized accessibility and characterization of usability. Even if the data sets actually, don’t, don’t get consolidated and copied multiple times into this, you know, one monolithic data warehouse, and that’s made easier through data sharing.
David Loshin: 00:30:16
You get this easier, broader mechanisms for sharing information from its source, as opposed to multiple iterations that that need to be processed that can take, you know, a day or a week in some cases. So you get faster access more accurate, more up to date data that enables your data consumer so they can do reporting. They can do more sophisticated analytics, they can build data products. And not only can they build data products, they can actually, you know register the methods by which they’re building those data products and, and, you know, registry so that other people can say, Hey, that’s a really good visualization. I’d like to do that, but I want to use you know, an additional data set that gets fed into that. And, and you get the, the, not just the data reuse, but you get the process reuse as well.
David Loshin: 00:31:03
You were talking about lineage that gives you some kind of horizontal visibility of the way that that information flows across the organization. But what that does is it gives you the foundation for looking at how business processes are, are enabled and empowered through that information flow. So if you’ve got that, you know, integrated into your data architecture you’ve got some visibility into not just how the, the business process is, is, is enabled, but also whether there are any potential opportunities for improving the way that business process is done. And then, you know, I always go back, this is my, my comfort place is talking about governance and compliance. The, the better, you know, the more opportunities you have for, for configuration of a, a broad landscape of information for instituting data policies, the, the, the more predictable, and the more auditable your mechanisms are going to be, and that can, you know, allow you to breathe, you know, a sigh of relief when you are pretty convinced that you’re not exposing sensitive information, for example, or that you’re complying with, with some, some banking regulations or some industry standards, et cetera.
Philip Russom: 00:32:15
Yeah, very good. And that, that’s quite a list, isn’t it? I found your list. David, I found your list very compelling that there really are a lot of business benefits that you really can’t achieve without a very solid data architecture behind it. And I would say on the flip side of that, there are also some technical benefits. You know, if you have to compare technical and business benefits go with the business ones later, when we make our case for the data architecture, we’ll talk about how your architecture really does have to address business benefits, especially if you wanna get funding and so forth from business people, it pays to it pays to address use cases and benefits that they’re interested in. But, you know, in the technology side of this, where it’s still the same problem or challenge that distributed data makes data management complex, not just business processes makes data management complicated.
Philip Russom: 00:33:08
It makes the development of data products slow to operationalize. And so the kind of data architecture we’re talking about by now, the audience is probably figured out, yeah, this is heavily centralized. So the centralization of this kind of architecture just makes it easier for you to have data standards for you to reuse data products and other development artifacts that are part of the data engineering or data ops processes, your data engineers and data ops people, themselves, data analysts and data scientists will be more productive because more data has been centralized and described very clearly so that they can find it and so forth. So we have some problems there that you know, that, that are complimented by benefits that help us to come over them. So again the architecture we’re talking about does involve a lot of centralization.
Philip Russom: 00:33:58
So it does consolidate multiple data sets typically. So when people move to a new architecture, quite often, it’s not just a migration to a newly designed architecture, there’s a fair amount of data consolidation that will go on so that people can get rid of some redundant data. And that’s a physical reworking of storage. There’s also data virtualization, which is an alternative. So we can sort of restructure and do some consolidation at the virtual level. And we skip on down. I talked about centralization. It makes it simpler, just makes everything a lot simpler for data ingestion to get the data into your environment. And the inevitable refinement as we repurpose data there, I already talked about standardization. You know, I mentioned migration and a lot of organizations are migrating from older platforms on premises to newer platforms on cloud and part of that, so that they can leverage modern platforms and tools.
Philip Russom: 00:34:57
And you know, it’s, in my mind, it’s the elasticity of cloud that makes cloud so compelling, it’s elasticity that gives you fairly automatic speed, scale and functionality increases, right? So migrating to a new architecture on cloud that has a lot of technical benefit. And then finally, you know, data observability is, is a kind of become a really big deal recently, hasn’t it. And in a way we’ve been doing observability for years, the way we monitor and re-profile data for say data quality purposes. Now we’re doing similar monitoring for pretty much any purpose with data. And so again the simplicity that comes from a, a centralized data architecture makes data observability a lot more likely.
David Loshin: 00:35:44
Yeah, let’s, let’s, you know, you mentioned building a business case. I think that is important, you know, and, and what I would love to do is kind of go back to some of the business drivers that we started with and look at how building a good data architecture actually will address that. So, I mean, I think first of all, if we talk about the customer experience, you know, if you’re, if you’re able to get integration of, of data about your customers from those different you know, I’ll call ’em administrative domains, it might be different, different systems within one line of business might be across different lines of business, or again, as I kind of suggested earlier across networks of different organizations providing that harmonized view, and that’s using that semantic layer that, that you made reference to a couple times that layers or leverages the the metadata that can address the ability to to be able to provide a consistent and up to date customer view, customer profiling.
David Loshin: 00:36:37
I mean, certainly when it comes to data sharing, let’s go to that next bullet item. If you’ve got a good data architecture, it really helps streamline data sharing across those those partner or channel networks, because you’ve, you’ve essentially been, been able to engineer perhaps data services and APIs that simplify the exposure of the right information to to your trading partners. The third bullet item is about being able to implement analytical models directly within your information flow that allows you to do rapid reaction to to disruptive competitor competitor changes. So as you get new competitors, you can look at what they’re doing. You can go back to your data architecture and see how to rapidly modify it to adapt to the changes in the business environments. Certainly when it comes to fraud, you can use the, the ability that you were describing about fully integration, you know, integration, if you wanna go to that next bullet point integration of analytics for again, not just detecting fraud at figuring out what to do with after the fact, but recognizing it in real time and preventing those fraudulent events from taking place is much more beneficial, both to the customer and to the business, because it reduces their exposure and the risk to that type of drainage of, resources that next bullet item talks about auditability for compliance.
David Loshin: 00:38:08
And again, this is very similar both to what I was saying about fraud, which is if we are able to make use of observability capabilities that, you know, that fill that you were just referring to, it allows us to embed business rules into our information flows as part of the data architecture to, to monitor for compliance or non-compliance with a set of data policies that are derived from those externally directed mandates like like data privacy laws and those types of things. Our next bullet item is you know, feeds into that as well, which is if we are going to be forced into into sharing information from one organization to another organization, we also want to ensure that we’re not violating any types of rules. So we’ve got protection about sharing open data from one organization to another.
David Loshin: 00:39:00
Next one is talking about performance, if we are if we’re good in developing a data architecture, and, you know, you were, you were praising the the use of the cloud because of the elasticity and essentially its scalability. And if we are looking at a data architecture that can be deployed in an environment that has infinite or basic infinite scalability, it allows us to increase the performance of our business processes, because there’s no longer an artificially imposed technical dependency in the environment that is preventing those business processes from performing at the highest level. And finally if we’ve got a good manifestation of what all the benefits are, we can also better socialize the transition to a modernized data architecture and modernized environment among, you know, any of the individuals in the organization that might raise objections.
Philip Russom: 00:40:03
You know, that’s good. And these are all great points. And, you know I love the title of your slide here David, because, you know, if you do address the business drivers in the design of the data architecture, that alone will just help to build the business case for you. Now, let’s look at a slightly different way to build a business case, and, you know, the way David and I are talking, you figured out we feel strongly, you should base your business case on business-driven benefits and use cases. Number one, and then number two, technology driven. We’ve already talked about that. So I’m not gonna repeat that much, but there are other other things you should do. So for example, many of you are facing a lot of changes across all of it, not just in the data world, but across it.
Philip Russom: 00:40:45
So as changes are made, make sure that they’re not mere changes, make sure that they are also improvements, right? There’s difference between change and improvement. For example, don’t just move to cloud, improve to cloud. So you want to try and improve data improve some of your data products, data-driven solutions, like analytics and reporting. You want to improve these things as you go to cloud. And you know, also as you move to cloud, if you’re, if you’re not on cloud to begin with that what’s often called a re-platform—you’re changing platforms. And so anytime you change platforms, whether the cloud’s involved or not, you do need to rearchitect data and solutions to get the full value of the new platform, right? The lift and shift has its place, right, but it’s not enough if you wanna get full value out of the new platform.
Philip Russom: 00:41:37
And that’s part of the business case, then you need to be prepared to re-engineer for that. And then also just always keep business requirements and business value in mind. Anytime you are say designing the future state data architecture, anytime you are communicating with people to promote this architecture that you’ve designed be sure that business requirements and business value are made clear in your descriptions of these things. When in doubt, put business value first, technical values second. So I also think something you should do is when you talk up use cases that a new and modern data architecture can enable, prioritize that list of use cases. And what do you put at the top? Well, look for use cases that really appeal directly to business people and especially the kind of people who have some influence on budgets, et cetera.
Philip Russom: 00:42:37
And I do see a lot of business people demanding advanced forms of analytics, especially predictive analytics that may be enabled by artificial intelligence or machine learning. And this is because so many business people have been running the company by looking in the rear view mirror of the car, because that’s what we do when we look at old data and analyze what happened in the past. And they wanna be looking through the windshield to use analytics, to predict what’s about to happen or to see, just see things closer. David Loshin mentioned this earlier, you know, David, you and I both have we both have a lot of insurance clients and fraud is still the number one problem in insurance. And there’s been analytics for years, which after the fact sometimes long after the fact like months or, or even years after we can find out, oh, these these events were fraud.
Philip Russom: 00:43:28
Instead people wanna be able to see fraud coming and stop it. So you need a new data. Architecture has a lot of real time functionality and so forth, predictive analytics built into it. So that kind of thing that right there, if your architecture can promise to deliver that you’ll get a lot of support from the business. Another thing that appeals to business people would be self-service access, because at least in my experience, the average self-service user is a business person who knows the basics of the relational paradigm and SQL such that they can browse data, even construct simple queries. And once they’ve found data of interest they can use self-service tools to create a data set and do some visualization with it. So this one is highly appealing to business people because they are the primary beneficiary of self-service data.
Philip Russom: 00:44:19
Exploration is something they often do in a self-service environment. So that’s high on the list. I mentioned real time data in it, you know, with the fraud detection example I just gave. There are also lots of operational cases where you’re just trying to monitor, say a manufacturing facility or nowadays everybody’s monitoring their supply chains in a more granular fashion, because we all know the global supply chain’s quite messed up at the moment, isn’t it? So real time is something for business people in those situations and more and more companies, quite often it’s chief officers—quite often the CEO who says, you know, we’re gonna share information more. I see innovation in some business units and the other business units don’t even know about it. “We’re gonna share data. We’re gonna share innovations based on data.” And that’s where data sharing is really rising in importance. So it’s quite a list, but you get the idea, prioritize your business, use cases according to what you think will help you sell this to the business. Now, David, I put advanced analytics and self-service at the top. What would you, what if you were looking at this, what would you put near the top of this prioritize list?
David Loshin: 00:45:26
Well, you know, I always default to governance. I think governance is much more sophisticated than what people thought it was 10 years ago. And in fact, you know, every time I look at business processes, there is some component of governance, whether it’s, it’s explicit, such as there’s well defined specifications of operations in, you know, some externally defined directive, or whether it’s implicit, such as you know, the need to have, you know, to apply some taxonomy or some, some classification of set level of sensitivity to data assets, and even looking at data assets in terms of valuation for business purposes. So, so, you know, there’s, there’s a, there’s a deep thread that goes into governance that I think goes way beyond what people, you know, think of in terms of, you know, data lineage and data quality.
David Loshin: 00:46:18
I think governance is you know, I almost hesitate to call data governance anymore, but I know that there are, there is a a discipline called information governance that I think subsumes what we refer to as data governance. And I think that’s a little more sophisticated and involves, you know, all sorts of different aspects of the data life cycle in organizations. So that’s for sure. And definitely semantics, if you want really wanna enable self-service access, there’s gotta be some simplified way of letting people know what is in there and how, what they need to do with it.
Philip Russom: 00:46:57
Well, yeah, great points. Thank you. And while I got you on the hook, why don’t you get us going towards the end here by giving us a summary of why care about data architecture.
David Loshin: 00:47:07
Yeah, I, you know, I think together we’ve put together a whole bunch of really good points. You know, distributed data continues to be a challenge. It’s becoming a much more acute challenge. When we start looking at trying to have a data driven organization that’s pulling information or pulling data from a wide variety of sources and data architecture, it’s something that is going to help to address that as that challenge. I mean, the next point would be the ability to centralize, you know, the perspectives of data that are accessible and available that simplifies usability, but also allows you to control usage, which is imposing sets of policies on how that information is being used. So by doing so, you’re able to raise the business value of that data. You know, that next item is governance again, that is my sweet spot. And, you know, the more we are becoming aware that governance has to be, you know, baked directly into the way we share and make use of the data being able to provide a data architecture that bakes in that governance is definitely a way to simplify the control mechanism. You wanna provide your feedback on some of these?
Philip Russom: 00:48:27
Well, the next bullet to pop up says, cloud is a chance to reset your practices and improve your data architecture. And you and I were comparing notes about some clients recently. And boy, do I see this with my clients, my most recent employer was Gartner incorporated. And so with Gartner clients, they’re not just migrating to cloud, they’re taking that as an opportunity to make tremendous improvements to data architectures, but also a wide variety of stuff there as well. So cloud is definitely and also it’s funny today, we’re talking about different ways to make a business case for a new data architecture. You know, some people, they have no trouble making a business case because they’re told to do it. In some cases, the company says, you know, we’ve, we’ve looked at cloud, we’ve studied it. It’s now time to make a deep commitment to the cloud concept in general. And we’re gonna use cloud X for this and cloud B for that. So in some cases the business case has already been made. And so organizations have to scramble to then design the future state of whatever piece of it they’re in and what it will look like on cloud. And you know, you have to do that, because if you don’t, you don’t know what you’re really migrating to.
David Loshin: 00:49:41
Yeah. You know you know, just, just to refer back to something you said before about lift and shift versus improving to the cloud, you know, one of the challenges in doing that migration without doing a modernization is that, you know, you spend a lot of time effort, money, you know, budget in moving an application to a new platform without, without improving the way that it actually works. So what have you done? You spent months and money to get something that works the same way it did 20 years ago. So you know, absolutely correct. If you’re moving to the cloud, it’s a great opportunity to reset, you know, to review how your business processes make use of data and to modernize the data architecture.
Philip Russom: 00:50:22
For the sake of time, I’m just gonna push on through and look at some of our recommendations, and David, feel free to chime in on these as well. So I think people, you really, people should begin by attaining a technical understanding of business goals. Now this assumes the business knows what its goals are and they could communicate them in an articulate way to the rest of the organization. In some cases that can be a big option, but assuming that works you know, before you start planning a data architecture, figure out where the company wants to go, how’s that different for we’re doing today. And that’s where a lot of data architectures come from. There’s some sort of change in opportunity there. Also do what we’ve been saying here, which is to establish management commitment by selling them on high priority use cases that are highly meaningful from a business manager’s viewpoint.
Philip Russom: 00:51:16
Also stay focused on the, on the primary goals of the technology of a modern data architecture. Remember the data consolidation, centralization, and standardization that we’ve talked about, you know, don’t lose sight of those things. Those are, you know, so many of the benefits we’ve talked about, depend on those things happening. That’s why you have to stay focused and make sure they’re actually happening. I mentioned this in passing you know, you have to design a data architecture, you don’t really buy it out of a box. Some cloud providers will tell you you do, but no, not really. You have to design the data architecture and you should design it as soon as possible, even if it’s gonna evolve over time. And this is, so you have something to show people as you promote this idea of a more modern or different architecture. Also you can’t, you can’t create a migration plan if you don’t know exactly what you’re migrating to.
Philip Russom: 00:52:10
So you have to design the future. You have to design the endpoint of the migration. So you need to assemble, a diverse internal team, and it’s not just, it’s not just techy people because you know, when you, when you migrate an architecture to a new platform and quite often that is cloud, you’re not just moving data, you’re moving applications, all kinds of technical processes. You’re also moving business people and business processes. So you need business people involved in the planning process to be sure that all their needs are addressed as well as they can make plans for how their work will actually change in the future. And then don’t move data. Also improve data. We talked about that repeatedly. So I think we covered a lot of stuff. David, do you want to you have anything to add before we go into some Q and A?
David Loshin: 00:53:02
I don’t know if I could add any more to what you just said. I think you covered all of it.
Philip Russom: 00:53:06
Yeah. So at this point I’d like to ask David Davis to come back in and David Davis, do you have any, I, I know there have to be some questions that have come in. Can you share, share a couple with us?
David Davis: 00:53:16
Absolutely. Yeah, really excellent discussion, really great recommendations. That was my favorite slide. I took a screenshot of that. I learned so much about, you know, data architectures, distributed data, why governance is just so important how cloud fits in and making the business case around data architecture. So really excellent presentation. Thank you so much. I appreciate that. Yes, we do have some questions for you from the audience. Before we start that I wanna remind everyone who’s here on the live stream. If you have a question you wanna ask live, just raise your hand here in Zoom. I will call on you, and I’ll ask you to unmute and you can ask your question live here to our expert presenters. Let’s see. First question that we have that’s come in here is from Jordy. Jordy, I see you there in the audience. Go ahead and unmute and ask your question.
Speaker 4: 00:54:08
Yeah, yeah. Thanks for the session guys. I just wanted to get this answered. So you said that a data architecture must be carefully designed, but who’s supposed to design it.
Philip Russom: 00:54:23
Yeah, that’s a great question. And you know, there are people with the term data architect in their job titles and there have been a lot of ’em through the years. There is the enterprise data architect and this person is quite often looking at architecture on a very broad scale, even across business units in a large enterprise. Then a lot of data architects do their work more locally. For example, I’ve known a lot of data warehouse architects and the scope of their work is just for the, the data warehouse. And yes, a data warehouse does have a data architecture of its own, even though it may be within a larger data architecture. So you typically have people who are experienced as data architects. Quite often. These are data engineers who work their way up quite often, they’ll have a specialty in data modeling. Data modeling and data architecture…
Philip Russom: 00:55:12
They’re not the same thing, but they do have a lot of very similar design principles. So so the short answer is data architectures are typically designed by data architects. Now, a related question would be, do I have ’em on staff as full-time employees, or do I augment my staff with the consultants? So for some organizations they choose to, they decide that data architecture’s not really a competency they wanna foster in house, and instead they have data architects from some of the more leading consulting firms that have data practices to serve as their data architects.
David Davis: 00:55:53
Excellent. Thank you so much, Phillip. Let’s see next question. Here comes from Jennifer. Jennifer, go ahead and unmute and ask your question.
Speaker 5: 00:56:01
David, thanks for, for putting this on. I really enjoyed it. Kind of just what you were just saying, Phillip and talking about who designs it? What about the overall team of creating a data architecture? Who would you wanna have on your team and what kind of experience, whether it’s in house or out of house, you know, what kind of skills they should have?
Philip Russom: 00:56:21
Yeah, good question. And kind of a short answer is a lot of organizations I’ve been dealing with for years do have a data management team that is a centralized team, but it provides data services to a wide variety of business units, departments, and a wide range of use cases. And quite often that’s organized in what’s called a competency center or a center of excellence. And both of those are where you have shared services and the services are humans. So these are human resources that are pulled into a central team. And a really well defined, a really mature and built out centralized competency center for data management will include data architects. It’s quite often the case. Also, data architects quite often work their way up the food chain into being managers of data teams.
Philip Russom: 00:57:15
So progressively over the years, we’ve seen senior engineers be leaders of these teams. And nowadays more likely it is a data architect who leads a team. Sometimes they lead the whole competency center. Sometimes they’ll just lead portions of it. And this is, this is good because that way you have someone whose job really is design, and architects also take command of standards. So you have someone who’s an expert in design and standards leading teams, to be sure that work that gets done in the data area does comply with their standards for architecture, other data standards, as well as align with business needs.
David Loshin: 00:57:55
Yeah, I’m, I’m gonna throw another role into that list that you just had, which is you know, one of the, one of the challenges is being able to, to effectively solicit requirements from the business people to really understand, like you can say, you can have a data analyst go to a business person and say, what data do you want? And they’ll just say, typically I want all of the data, but really what you want is somebody who’s gonna go to the business people and say, you know, what are the, what are the core business problems that you’re trying to solve? And where are the gaps in the availability of that information so that we can address that in that data architecture. So I think it’s, it’s almost like, you know, a solution engineer or a requirements analyst, somebody who speaks both business and technology.
Philip Russom: 00:58:43
Good, good point. Thank you, David.
David Davis: 00:58:45
Yeah. Excellent point of finding that, that person I know is always a challenge at some companies, the person who can speak both data and business. Excellent answers all around there. Let’s see. Next question. Here comes from Natasha. Natasha?
Speaker 6: 00:59:02
Hi. Yeah, thanks guys. So I saw you mention data governance in one of your slides, and I’m just wondering, what’s the role of data governance in a data architecture.
Philip Russom: 00:59:16
Data Loshin. I think that’s your area. Go ahead.
David Loshin: 00:59:18
Yeah, sure. What we used when, when the term data governance first came into vogue, it was really just kind of a fancy way of talking about data quality. I think now what it really means is effective use of information under a set of different controls and that, you know, that governance kind of bleeds out into the ability to define policies about information use and availability and how to embed that directly into that data life cycle. So for example, I might be, you know, one type of analyst and I may be restricted from looking at certain types of data, while Philip is a much more sophisticated analyst who has access to all the data. But if, if all we’re doing is building an API that allows you to run a query against database, for example, you know, what that does is it allows him to see what he needs to see, but allows me to see way more than I’m allowed to see governance is a kind of a practice at this point, which involves understanding what the, what the usability expectations are from among the different consumers, as well as what the imposed rules are and how to, how to manage all of that without, without turning into, you know, a big terrain of spaghetti and meatballs, it’s really, you know, it becomes really sophisticated.
David Loshin: 01:00:37
And that’s why you see a lot of vendor tools out there that are supposedly, you know, addressing the data governance domain. But it really, it’s not, it’s not a tool, it’s a practice. And it’s something that needs to be, you know, engineered into the architecture and continually monitored as part of part of that practice.
David Davis: 01:00:58
Yeah. Great point. I like that governance is not a tool. It’s a practice. Next question comes from Riley. Riley?
Speaker 7: 01:01:07
Hi. I had a question around security around a data architecture. So what are the security issues and what approaches to security are best for a data architecture?
Philip Russom: 01:01:21
Thank you, Riley. That’s a great question. And, you know the kind of data architectures we’re talking about, it’s like anything we build in it, it definitely needs security. And in fact it probably will need multiple approaches to security, multiple layers of security. And so it’ll have a lot of the usual stuff of username, password, you know, identification and authorization that goes with that. But I think if you look at the architectures we’re describing, they have a lot of different components and many different use cases. And so when you have different use cases, you have different tools. And so you don’t really want users, whether they’re technical users or business people, you don’t want ’em to have to log into everything as they sort of move across the architecture. So single sign on is a very useful thing to have.
Philip Russom: 01:02:10
You can also creatively use role based security or directory based security so that certain roles are allowed access in certain data sets within the architecture, but not others, right? So those are, those are common ways to use security in a multiplatform, multi solution environment. Like these large data architectures. Now there’s also stuff, and some people say, wait a minute, that’s not really security. There’s this thing. That’s often called data protection and it’s down at the storage level. And it typically manifests itself as data encryption and data masking. And so for these kind of architectures, it’s sometimes not a matter of if somebody hacks their way into data. It’s more like when, so with masking and encryption when somebody gets into data sets at the storage level, they’re not supposed to be in, then the data just looks like gibberish to them, because they don’t have the appropriate DEC description, keys and so forth. So you get the idea. There’s definitely multiple roles for security and data protection for a data architecture.
David Davis: 01:03:12
Excellent. Thank you, Phil. Let’s see. Lots of great questions have been coming in here as well typed in from the audience. The first one I see on the list, they’re asking what’s the difference between a data architecture and similar things like it, infrastructure and the so-called modern data stack.
Philip Russom: 01:03:31
Yeah. David Loshin, let me start that one. And then I think you have some ideas on this too which is I think when most people say infrastructure, they mean fairly deep IT infrastructure and that’s what a lot of it organizations give you nowadays. I’m old enough to remember when it gave you solutions. Today they give you infrastructure. So they provide all the corporate networks, storage, subsystems, they provide hardware data center, you know, physical, real estate facilities and so forth. And you know, a lot of that’s changed because of the evolution of the data center. And a lot of data centers are now off sourced to clouds and so forth. So so for a lot of people, infrastructure is very deep and very foundational. Whereas the data architecture we’re talking about would be built on top of that kind of, that kind of infrastructure. So it’s the really so architecture and infrastructure, you know, everything’s in layers, right? So it’s the comparison is really, they’re two different layers also with the data stack. I mentioned it earlier and I think of it as kind of a vertical slice of an architecture, whereas an architecture has lots of vertical slices, but also has a lot of horizontal aspects because there’s so many processes that are winding their way as data flows left to right through the architecture, David Loshin. Do you, do you have any further thoughts in these comparisons?
David Loshin: 01:04:52
Well, yeah, I mean, I think what I would do is I would kind of cast it in terms of, of two different, you know, very high level use cases for data. One is, you know, operational processing or, or transaction processing, the other’s analytical processing and your data architecture really is intended to facilitate both of those, those paradigms. You know, so for example, you know, in, in the old world transaction processing, you know, there’d be some mechanisms where transactions would get partially completed, but then the full completion of those transactions would be done in batch overnight or something like that. While today the expectations that customers have is that, you know, when they order something through an eCommerce site, they expect to know right away that the process, you know, that the sale has gone through and that the item is in transit, you know, and what’s the tracking number?
David Loshin: 01:05:43
So, an increased expectation of real time, straight through processing from an operational standpoint, that’s one perspective. The other is the analytical life cycle, which is being able to suck data out of a variety of sources and provision it as quickly as possible to provide visibility into up to date data, to be able to do real time customer profiling, real time recommendations and those types of things. So both of those are starting to merge together because the analytics are embedded in the operational processing or the transaction processing. So to be able to do that, you know, we can look at, quote unquote a data stack, well, that’s just kind of like the way the data’s, you know, stored or managed, you know, what levels or what type of infrastructure is it stored in — is it stored in fast, you know, memory or, or solid state discs, or is it, you know, pushed out to archive. But in reality, you know, the data architecture is the overlay on top of that infrastructure that enables the optimal use of data for those, those emergent use cases.
David Davis: 01:06:58
Yeah. Thank you for clarifying that. I know that can be challenging to visualize for folks who are getting started in this, especially. Here’s another good question from the audience. They want to know what are some of the barriers that we’ll need to overcome when we start a data architecture program?
Philip Russom: 01:07:16
Well, let’s see off the top of my head. One barrier, I think I mentioned earlier is you need to decide, will we have data architects on payroll, or will we depend on consultants for it? That’s a choice I see people confronting all the time. Also you know, where, where do you staff them? Like I said, larger competency centers for data management tend to have architects on it. So that’s an obvious one. If you’re taking more of a federated team structure and some organizations do instead of the centralized competency center, they’ll instead have lots of small teams in each business unit department. So then the question there is even if these departments are sharing the same architecture, who’s gonna design it, probably consultants in case like that, but you get the idea you know, where does the architect come from to whom do they report? Are they a full-time employee or a consultant? Those are basic problems to work out before you get into it. David, do you have any other thoughts about how you start?
David Loshin: 01:08:25
Yeah, well less about starting, but more about, about some of the barriers that for being successful in doing a data architecture program. One of them is, you know, a religious observance of development methodologies that don’t take data management into account. So, you know, not to, not to cast any aspersions on any of them, but, but often, you know, our developers are pushed to build something, whether or not they know exactly what it is that they’re building and that there’s some iterative process of saying, okay, well, you know, we’re gonna, we’re gonna spend six weeks building something then see how good it is. You know there’s some benefits to doing that, but if you do that in the absence of understanding data usability and required data requirements and how that levels into a data architecture, you’re gonna end up building that thing multiple times. And that, you know, your iterations are prototypes. They’re not gonna be contributing to an overall solution. So I think that barrier is a severe barrier that needs to be overcome, which is recognizing that that data is a core component of any application architecture, the data architecture and how it interoperates with the business processes.
Philip Russom: 01:09:44
That’s a good point. And yeah, some people have trouble sort of believing in data architecture and especially our buddies in applications development. Sometimes they sort of pooh-pooh it and say, nah, that’s just some deep layer of my applications architecture. So, you know, we’re, and you know, my observation is most solutions have multiple architectures at play and they layer over each other and they overlap and so forth. So that’s the thing, you know, David Davis I have a little short anecdote here of which might make a good, closer, you know. If I go back 20 some odd years some of the first data architects I encountered typically had the job title, enterprise data architect. And I remember a long time ago interviewing some of these people and saying, so what do you do in that job?
Philip Russom: 01:10:29
What do you do? And one of the most articulate guys I ever met, he said, you know, it’s not really architecture. It’s really more like archeology, because for a lot of enterprise data architects, they’re studying existence that already exist. In other words, they never get a greenfield opportunity to design an architecture from the bottom up. They’re having to look at how data is handled in existing applications and so forth. And then suggest certain tweaks, not major changes, but some tweaks to make it easier to do things that the business needs. Like for example, years ago, these architects started showing up when we started sharing customer data across multiple customer facing business functions. And so the big problem there was how do we share customer data in a reliable and timely fashion across systems that are radically different.
Philip Russom: 01:11:25
So quite often the the data architects would step in and say, okay, if we just change the data model under this application, a little bit this way under this other application that changes it a different way, then it suddenly becomes easier to synchronize data about customers across these things. So at any rate, this guy’s joke, that was not very funny was that it’s not really architecture, it’s more archeology. And he felt like he spent his time digging in and understanding what other people built so that he can make very subtle changes. And then this also brings up one of the barriers to data architecture, which is how much authority does a data architect have. So with this enterprise data architect I was just describing, those guys would get frustrated because they would make a lot of suggestions and very few of them were actually followed up upon.
Philip Russom: 01:12:19
Today we’ve seen data architectures come a very long way. If you’re gonna make the full commitment to full-time employee data architects, you do typically give them the authority so that they can say, look, we have data standards. What you’re building, it’s designs will comply with our standards. Also we have compliance for, we have standards for, compliance and governance. We have standards for interoperability, et cetera. So quite often the data architects have designed standards that would go across multiple solutions, and they do have the authority to be sure that people actually build solutions that follow those architectural standards.
David Davis: 01:12:58
Excellent. I like that the data archeologists that’s a great story and I think that’s a great place to wrap up this episode of the Dataverse.ai podcast. Thank you so much to our experts, Philip and David. Really great presentation. Thank you so much.
David Loshin: 01:13:14
Yeah. Thanks for having.
Philip Russom: 01:13:16
Yeah. It’s been a pleasure to talk with my friend David Loshin, again, David Davis, nice to meet you. And my thanks to everybody at ActualTech Media for helping set this up.
David Davis: 01:13:26
Excellent. Thank you to you both. I learned so much about this topic today. I look forward to future episodes. I encourage everyone out there in the audience, if you haven’t already subscribed to the Dataverse.ai podcast, wherever you get your podcast and listen to your podcast, make sure that you add it to your list there. And of course thank you to our experts and thank you to our audience. I hope everyone has a great day and we’ll see you next time.