Audioboom

Transcript for S1E23 Good and Bad Data (ft. Anatoly Postilnik, First Line Software)

00:00:00,000 → 00:00:07,080

<v Jordan Cooper>With Anatoly, postal neck, head of health care practice at first line software, I had a totally thank you for joining us today. How are you doing?</v>

00:00:07,750 → 00:00:10,260

<v Anatoly (Guest)>Good afternoon. Jordan, I'm doing great.</v>

00:00:11,120 → 00:00:30,950

<v Jordan Cooper>So today we'll be discussing uh good data and bad data in particular will be discussing clinical data and data governance. Again with Anatoly. Anatoly, I'd like to ask you if you elaborate upon what data quality means to you and specifically can you address how to differentiate good data from bad?</v>

00:00:33,180 → 00:00:33,810

<v Anatoly (Guest)>During.</v>

00:00:34,820 → 00:00:54,950

<v Anatoly (Guest)>This is a fascinating topic that comes up all the time in different scenarios and the one of the major questions that usually comes up is you know what is good data, what is bad data, and how do we define what is.</v>

00:00:55,650 → 00:01:19,200

<v Anatoly (Guest)>Good day. Then how we deal with situations with when data is not good and I think the what we need to start first is basically give a definition how would we define a good data and from my perspective and from experience, I'd say that good data is the one that serves the purpose. That's a very kind of simple, straightforward definition.</v>

00:01:20,760 → 00:01:49,180

<v Anatoly (Guest)>Because the data is created by people, the data is created by systems and it is designed to be used for a particular purpose in the majority of cases, the data is as good as the data is good as long as it serves the purpose it was created for. So when, for example, we look at the data that is stored in a in the HR system and tried to extract this data from HR system.</v>

00:01:49,390 → 00:02:03,390

<v Anatoly (Guest)>We'll find lots of lots of things that we don't like in reality. This is only because we're trying to use this data for the purpose that deviates from the purpose this data was originally created for.</v>

00:02:04,290 → 00:02:36,600

<v Anatoly (Guest)>So this is 1 fundamental misconception. Data is really good for specific system that was designed for and as long as it works for that system as long as supports the functionality and the purpose of that system, it is good. So let's talk about different scenarios where data come across as questionable and not sufficiently good. Typically one of the major scenarios, one of the major reasons the data.</v>

00:02:36,830 → 00:02:38,920

<v Anatoly (Guest)>Comes across as a.</v>

00:02:39,720 → 00:02:49,570

<v Anatoly (Guest)>Uh put on for poor quality is when we extract the data. When we do something with the data in reality is we transform the data every time and we don't think about it.</v>

00:02:50,290 → 00:03:21,240

<v Anatoly (Guest)>For example, when we look at the interfaces where they it is HL 7 interface or it is fire interface. When we when we expose the data from HR system from any other system through interfaces that is not the data that is in the HR system. That is kind of massaged data, converted data, transform data for the purpose of the interface. Obviously you may store data in relation table relational format or non relational format.</v>

00:03:21,360 → 00:03:42,530

<v Anatoly (Guest)>In a particular database, when you extract it with using fire, it's Jason and it is disconnected from a particular system system. There is no identifiers, there is no referential integrity, things like that. But at the same time this data extracted from the system, we perfectly serve the purpose that it is used for.</v>

00:03:43,350 → 00:03:49,600

<v Jordan Cooper>So I think I hear what you're saying is that data is good if it serves a specific purpose.</v>

00:03:51,100 → 00:03:54,600

<v Jordan Cooper>Yeah, that it was designed for. And then.</v>

00:03:54,740 → 00:04:02,330

<v Jordan Cooper>Uh data quality loss is often due to transformation of data, and that leads to bad data. Is that correct?</v>

00:04:04,170 → 00:04:04,710

00:04:02,800 → 00:04:10,240

<v Anatoly (Guest)>That's correct. That's correct. But there are other reasons why the data might not be may be questionable and may not be perfect, but please get.</v>

00:04:10,950 → 00:04:19,110

<v Jordan Cooper>So I think many of our listeners who again are often the CIO or their counterparts of large healthcare delivery systems and the United States.</v>

00:04:20,350 → 00:04:51,180

<v Jordan Cooper>Would be interested in hearing what you have to say about trend data trends you've seen across the industry now as the head of healthcare practice at first line software, you have many different customers across the United States and you've been able to see at these customer sites how their data practice usage is put into practice. I think our listeners would love to listen to some anecdotes about what you've seen, what trends you've seen when data standards have been put into practice and what has led to good quality data.</v>

00:04:51,270 → 00:05:00,310

<v Jordan Cooper>In which case you be replicated by our listeners and what mistakes have you seen out of the bad quality data that would like that would be best avoided by our listeners?</v>

00:05:01,170 → 00:05:30,760

<v Anatoly (Guest)>Sure, absolutely. So let's, let's take take take a look at a few examples. Let's imagine we extract data for to an external system for research purposes using some ETL process to pull the data for into a common data model to for example search for eligible patients for clinical trials or studies. The common data models typically used for these purposes have pretty well defined data models.</v>

00:05:30,890 → 00:06:02,140

<v Anatoly (Guest)>Pretty well defined rules that that say what the good quality what what is the good quality data and those examples may include for example attachment of standard terminologists, standard concept IDs and when we pull this data from from HR systems we we we don't have that because very often we don't have that because the data stored the nature system might not have those identified, they don't they're not need it for the function of that system. So what we need to do is to actually do.</v>

00:06:02,420 → 00:06:26,840

<v Anatoly (Guest)>Uh, concept mapping, data normalization, data transformation, data analysis and quality analysis in the context of that particular system. But that's not necessarily the case. There is some cases where the data inherently has variability. For example, when you pull data from when you get a lab result by nature of the processes associated with testing.</v>

00:06:27,880 → 00:06:59,270

<v Anatoly (Guest)>Even if you send the same sample to two different labs, you may have some variability in the results. That is because the labs using the libs are using different equipment, they they are using this equipment with different level of precision and so on and so on. There will be some variability in in those results and it is common and it has been up to recently up until COVID that all of the results have to be reviewed by the provider and that's one of the reasons because the results are not perfect, they are there is variability in those results.</v>

00:06:59,400 → 00:07:29,300

<v Anatoly (Guest)>And sometimes two different labs will send you data that looks like this. It may, for example, the result may be incomplete or partial from the user perspective, when the clinician looks like that, it's pretty much the same thing, but the system does not understand that this is the same thing. So what human says and what the system says is a very different thing, and the stand and those two entities understand data differently. When we send data as a, let's say, a fast, a human can.</v>

00:07:29,400 → 00:08:01,090

<v Anatoly (Guest)>Completely and happily, not necessarily happily, but can look at this data and and look at their record and say, well, so clear. What's going on with your patience, but try to fit this data into a system. It's completely useless in many scenarios. The data just not structured, unstructured data is not very useful for this systems, but it is completely, maybe completely consumable by humans. Moreover, there are technologies today that can extract meaningful insights from this data. But again, this depends on the technology that is used in this particular case.</v>

00:08:02,260 → 00:08:21,770

<v Jordan Cooper>So you also mentioned in previous conversations that another reason for poor data quality is the source system itself. It's organic growth and evolution in the process and it's configured by which it is configured and maintained. Would you mind elaborating upon that within the context of an electronic health record implementation project?</v>

00:08:22,650 → 00:08:31,610

<v Anatoly (Guest)>That's a great point, Jordan. There's several reasons why the data that exists in HR system by itself is not perfect.</v>

00:08:32,170 → 00:08:53,450

<v Anatoly (Guest)>A large EHR systems are configured in built by by consultants by groups of people who are not necessarily talk to each other, so consultants may be configuring and the HR system for different departments, and often inadvertently creating data elements creating.</v>

00:08:53,610 → 00:09:18,190

<v Anatoly (Guest)>A elements that are duplicate conflicting. One of the examples is that we were worked with one healthcare institution. We found seven metrics, lengths of state metrics. Think of operational analytics that relies on lengths of state which is very important to metrics. You will get all kind of results depending what kind of metric you use in this particular state.</v>

00:09:19,310 → 00:09:49,420

<v Anatoly (Guest)>The other example is we found in another instance, we found a data warehouse that had a very substantial variability in the data because the data that comes to that data warehouse comes from multiple interfaces, and those interfaces are not necessarily reconciled, and designers of the data warehouse have not put in place proper data. Governments at the time when the data being committed to the data warehouse, for example, you might get ethnicity coming from.</v>

00:09:49,490 → 00:10:19,130

<v Anatoly (Guest)>She's on the interfaces expressed in different terms. It may be HC and may be Hispanic, or maybe something else and maybe coded value coming from different interfaces. And unless somebody reconcile this data, this will not be this will not be a good quality data and why nobody cancels this data could be a different different topic as well because when you send the an order to a lab and lab comes back with the.</v>

00:10:19,650 → 00:10:39,690

<v Anatoly (Guest)>What would the result for the source system? Uh ethnicity doesn't make much of a difference because they already have patient ID and all of the demographics data is already stored in the system. Yet the HL 7 message contains demographics information. What this system is gonna do with the demographics information? Who knows? It depends on this particular system.</v>

00:10:41,160 → 00:11:10,130

<v Jordan Cooper>You, uh, so moving on to the third reason for why you might have poor data quality. You have previously mentioned that the you attribute poor data quality to the aggregation of data from multiple systems without harmonization. Reconciliation you've referred to that previously as normalization, deduplication. You just gave one example of having multiple different systems. Would you be able to elaborate on a particular concrete example that you've worked on in the past?</v>

00:11:10,210 → 00:11:14,310

<v Jordan Cooper>And what you did to resolve that poor data quality issue with your customers?</v>

00:11:17,410 → 00:11:31,940

<v Anatoly (Guest)>The process of aggregating data from multiple multiple sources is a multi. It's basically a multi step process but there is a source data. The source data goes through, usually some kind of.</v>

00:11:32,290 → 00:11:52,590

<v Anatoly (Guest)>A transformation process, typically done through maybe an integration engine like Health connect, inter systems, Health CONNECT and not a lot of people realize that integration engines is not just a tool to transform the data, it's also workflow engine. You can analyze data as it being.</v>

00:11:53,150 → 00:12:06,560

<v Anatoly (Guest)>Being a converted, transformed and sent to the destination system, you can do all kinds of interesting things with this process. You can analyze quality of the data, you can generate alerts, you can.</v>

00:12:06,640 → 00:12:36,430

<v Anatoly (Guest)>Uh, you can currently in the data for further manual analysis of this data and reconciliation and resending the data to the destination system. There lots of lots of different things that you can do at the time when the data is being committed to the destination system. It's a lot easier to do it than later on. Try to analyze this in the in the destination system, then to earlier than analyzing the destination system.</v>

00:12:36,750 → 00:12:47,230

<v Anatoly (Guest)>But do that while the data is being committed to the to the destination system. One of the examples is we're working with one institution where they have lots of duplicate records.</v>

00:13:13,010 → 00:13:43,380

<v Jordan Cooper>So I'd like to talk about more broad trends. I've spoken about general specific use cases of where data quality has been poor or has been of a better quality. What have you seen generally, what sort of data practices have you seen that are widespread across many customers in the United States that they're all doing something to get good data? Or what's a common error that you see many institutions making?</v>

00:13:43,520 → 00:13:50,220

<v Jordan Cooper>What sort of trends in good in uh in in data quality practices have you been seeing across many institutions in the US?</v>

00:13:51,770 → 00:14:05,260

<v Anatoly (Guest)>Well, obviously, and the most important thing that organization can do with respect to data quality is actually install it kind of setup data governance processes.</v>

00:14:06,520 → 00:14:08,790

<v Anatoly (Guest)>And define.</v>

00:14:10,150 → 00:14:26,180

<v Anatoly (Guest)>Define define methodologies and best practices to keep data of of good quality and perform periodic sanity checks. Periodic periodic revision of the existing data and existing scenarios where the data may be of poor quality.</v>

00:14:27,300 → 00:14:55,430

<v Anatoly (Guest)>At 1.11 kind of expression of this poor data quality and opportunities to to be proactive in improvement of data quality is quality is reporting a lot of institutions, a lot of most in fact, healthcare organizations generate operational reports used for multiple different purposes. What we found out is organizations who do not impose.</v>

00:14:56,720 → 00:15:09,400

<v Anatoly (Guest)>Rigid and meaningful data quality data quality practices. They end up with reports that duplicate that conflicting and not a lot of people know about that.</v>

00:15:10,460 → 00:15:37,600

<v Anatoly (Guest)>It is important to create for example, good quality reporting catalogs where you can classify different types of catalogs which are basically analytical assets. Very important assets for their organization and setting up. And we've helped organization to set up data governments over their reporting, reporting assets and kind of organizing reports in a way that they know what this reports are about. It's not just.</v>

00:15:38,000 → 00:15:42,520

<v Anatoly (Guest)>Uh, it's not just the report itself, it's the metadata around those reports that they import.</v>

00:15:43,710 → 00:15:56,240

<v Jordan Cooper>So when one of our listeners, perhaps the CIO of a large health system in the US, finds that they do have data quality issues, right? And it's widespread across the organization.</v>

00:15:56,780 → 00:16:17,570

<v Jordan Cooper>Uh, what sort of step should they take to remediate those issues that might lead them, for example, to speak to first line software? Or what sort of issue, what sort of steps have you seen those sorts of customers in the past take to resolve their data quality issues?</v>

00:16:18,330 → 00:16:35,600

<v Anatoly (Guest)>Well, the the first step of course is to understand where, where, what is the impact where the so when when people talk about well, we discovered the data is of poor quality. They obviously discovered that in a particular context. So one of the.</v>

00:16:36,220 → 00:17:05,790

<v Anatoly (Guest)>One of the most important steps is to define the criteria. What is good, what is, what is important, what is not good as I, as I mentioned, the data can happily live in a particular system and as long as the system functions correctly, that's not the major problem. That's not a problem at all. So what is important is to understand in which cases the data impacts operations, the quality of data impact apparations. So as I mentioned, we.</v>

00:17:05,860 → 00:17:28,550

<v Anatoly (Guest)>Found in in so for reporting you found a lot of duplicate reports. We found a lot of inability for their organization to recognize the data for quality, so setting up practices and saying what context did data would need to be of good quality generate analysis. Do periodic reviews of certain elements of the data would be very well.</v>

00:17:29,970 → 00:17:33,510

<v Jordan Cooper>So as we approach the end of this podcast episode.</v>

00:17:34,950 → 00:17:50,860

<v Jordan Cooper>I'd like to ask you kind of a playful question. If you were to rub a magic lamp and have a genie come out of it, who's willing to give you a wish to solve any problem confronting a large healthcare organization, what would you wish for?</v>

00:17:52,110 → 00:17:56,420

<v Anatoly (Guest)>But yeah, that's that's an existential question.</v>

00:17:58,420 → 00:18:18,670

<v Anatoly (Guest)>Well, I I think I think your question is very loaded because there is one topic is of data quality. The other topic is of course of the system that supports operations of the particular institution. And to me, what is in today's world? What is?</v>

00:18:19,770 → 00:18:22,780

<v Anatoly (Guest)>Impacting the most is kind of.</v>

00:18:24,100 → 00:18:54,390

<v Anatoly (Guest)>Closed environment. The fact that our EHR systems are kind of more from analytic closed environments that not like our mobile phones, the beauty of our mobile phones is that you pick best operate applications, apps, download them and they all work together very well. And the glue between those apps is the operating system and the data models that connect those.</v>

00:18:54,550 → 00:19:14,020

<v Anatoly (Guest)>Perhaps together so in ideal world, those systems should become open and the data models that they operate on become more standard, more exposed, more governed, more controlled. In this case, the apps will work much better, and the we all will benefit from that.</v>

00:19:15,610 → 00:19:27,540

<v Jordan Cooper>Well, I think we've covered a lot of topics today. We spoken about good data and bad. I think we've learned that data is contextual and that can be good in one context, bad in another we've learned.</v>

00:19:27,620 → 00:20:00,790

<v Jordan Cooper>Umm, that data good data is defined as as that where it it addresses the functionality of the system was designed for. There are three main causes of poor data quality. We covered. One was loss of data quality loss due to transformation of the data. The other was poor quality data because the source system itself was not configured or maintained properly and the third one was poor quality data due to aggregation from multiple systems without harmonization reconciliation.</v>

00:20:01,130 → 00:20:07,880

<v Jordan Cooper>And then finally, we talked about how basically when you have the correct.</v>

00:20:07,970 → 00:20:37,160

<v Jordan Cooper>Uh, connecting tissue between different sorts of organizations that right interface engine, for example, that may be the best way to preserve and maintain good quality data, since again, as you spoken at one example, good quality data in an XHR may be poor quality data in some other use case, and so translating it and making sure that you have a deduplicated normalized set of data is important to make sure you maintain good quality data. So.</v>

00:20:37,950 → 00:20:45,800

<v Jordan Cooper>Anatoly, this has been Anatoly Postilnik, the head of healthcare practice at first line software had a totally like to thank you very much for joining us today.</v>

00:20:46,440 → 00:20:47,630

<v Anatoly (Guest)>Thank you, Jerry. There was a pleasure.</v>

Sorry, your browser isn't supported by Audioboom.

Page load failed