Structured data: a joy to work with
IgniteData’s CTO, Richard Yeatman, talks structured data and its value to clinical trials, as well as approaches to and challenges of re-using unstructured data
In my line of work, I find myself talking about concepts like structured data, data mapping and eSource data all the time. But sometimes it’s worth explaining what we mean by these terms and why they are so valuable to conducting clinical trials. Structured data is something that you might say is worth its weight in gold when it comes to clinical trials and healthcare data. So, what is it?
Structured data is data that is highly organised, with separate fields for specific data elements. Typically, it is data that answers a particular question in a standardised way, for instance a result or measurement that is numeric or uses coded terminology. Examples of healthcare data that are easily organised into a structured format include medications, laboratory data, and vital signs.
Structured data is ideal for allowing the re-use of healthcare data. When data is structured, it makes it easier to know where to look for the information you need, and what you might find when you get there. A prime example is EHR-to-EDC – where the healthcare data captured in a hospital’s Electronic Health Record system (EHR) is transferred into the system used for clinical trial data (EDC). An automated transfer of data in this scenario saves thousands of hours of effort put into gathering the data needed for a clinical trial.
IgniteData’s EHR-to-EDC technology, Archer, loves structured data. Its purpose-built mapping engine allows structured data to be mapped from its field in the EHR, to the relevant field in the EDC in a straightforward process. So, while Archer is built on sophisticated technology, the actual transfer of structured data is made easy.
The growing availability of structured data
For over a decade I’ve been immersed in the world of healthcare software, and have watched the rapid evolution of the way data is captured, stored and used in healthcare. Digital health records, such as those held in EHRs, are relatively new concepts. We have only begun to see wide usage of EHRs within the last 10-20 years, with more sophisticated systems (like those employing HL7 FHIR technology) becoming more widespread in the last 6-7 years.
With this in mind, the capturing of healthcare data in a structured format is somewhat new to clinicians but, thankfully, is increasing and improving rapidly.
Factors driving this increase include:
- Shared care is becoming an increasingly common approach. As it becomes more and more common for a patient’s care to be provided by a number of different providers and agencies, clearer, more easily shareable healthcare data is becoming essential.
- Patient centricity is growing in healthcare provision. As well as helping providers share patient information, the availability of structured data also allows patients themselves to have easier access to their own data.
- Clinical audit, a quality assurance requirement for doctors to review the delivery of health care and assess clinical outcomes, is made much faster and easier where data is already stored in a structured format.
- Clinical informatics is increasingly relied upon to inform improvements to healthcare outcomes. High quality, structured data is crucial to this function, and so, it is helping drive improvements in the way data is recorded by clinicians.
- Clinical trials themselves are an important income stream for hospitals. There has been increasing recognition that the availability of quality, organised data can improve both the efficiency and outcomes of a clinical trial. The value of structured data is increasingly being recognised, as the number of data points included in clinical trials increases.
A high proportion of clinical trial data can be structured
When it comes to clinical trials, data that can easily be held in a structured format typically makes up a large proportion of the body of data needed for the trial. During the course of a clinical trial, repeat visits to clinic result in large quantities of data (such as observations) that is structured by nature.
In fact, as much as 50% of the data in a clinical trial is data that is easily organised in a structured format:
Some examples of the types of commonly required clinical trial data that is highly structured.
Case study: Demonstrating the value of structured data
This year saw the conclusion of a landmark pilot EHR-to-EDC data transfer project, when University College London Hospitals NHS Foundation Trust (UCLH), IgniteData and AstraZeneca evaluated how Electronic Health Records (EHR) to Electronic Data Capture (EDC) data transfer could perform in a clinical trial setting.
Four patients from an AstraZeneca-sponsored phase 3 oncology study were enrolled into a live mirror study; and data from the first 5 of their visits was electronically transferred from UCLH’s EHR to a copy of the study database.
Findings included:
- Archer successfully mapped 100% of the data for Vital Signs and Labs, and was able to successfully transfer 100% of mapped data from these domains.
- Transferring mapped data using Archer clearly showed time savings compared to manual methods – Archer could also significantly reduce Source Data Verification (SDV) and query resolution time in a live study .
- Overall, 96% of the targeted fields for the selected data domains of the Case Report Form (CRF) – Vital Signs, Labs and ConMeds – were mapped.
Additionally, IgniteData ’s mapping analysis found that the 15% of forms mapped by Archer accounted for 45% of all study data.
The insights gained from this pilot study will inform ongoing collaborative plans to optimise CRF design, explore medical coding of EHR medications, and expand the use of EHR-to-EDC technology across various domains.
What about unstructured data?
After making such great strides in the re-use of structured data, it’s natural that attention now turns to unstructured data.
There is still a lot of data that is important for most clinical trials – such as diagnoses, clinical procedures, medical imaging, free text notes and some concomitant medications – that is unstructured and not mapped into common terminologies. In fact, overall, estimates are that only 20% of healthcare is structured, with 80% overall being unstructured.
Where information is stored like this, it is much more difficult to search and pull relevant data from. In a clinical trial, it’s typically the job of a study site representative (often the trial coordinator or ‘data manager’) to scour through this information and pick out the relevant data elements.
This doesn’t only apply to the execution of a trial, but also has a significant impact on the time it takes to get a trial up and running. As much of the information related to the selection criteria for clinical trial candidates is typically held in free text notes, trial coordinators spend huge amounts of time scouring free text notes in patient records to identify potential trial participants .
Acronyms, abbreviations, misspellings and typos all contribute to the many variations in how a clinician might record data within free text notes. These variations mean that one study found anywhere between 5%-60% of encounters result in a patient’s problem list needing to be updated. For this reason, typical straightforward approaches to data, such as keyword searches, are impractical and hugely time-consuming.
Is AI the answer?
The leading emerging potential solution to this challenge is the use of artificial intelligence (AI) tools, such as natural language processing (NLP) and machine learning.
We can foresee a time soon when AI could be trained, not only to recognise particular data elements and phrases, but to find and learn from variations. When an algorithm could transform multiple unstructured documents and text from the EHR into FHIR-compatible structured data to be passed straight into a study database.
This will have a significant impact on the ease and speed of identifying and recruiting patients, getting trials up and running, and completing transfers of all the data elements needed to assess clinical trials. It will make more data available for clinical studies, drastically reducing the burden on sites, while improving speed and quality for sponsors. It will allow for the use of data direct from source, meaning more accurate data, less time spent checking for transcription errors, and less time querying data. And all that is not to mention the potential for this more sophisticated use of data to greatly impact both patient care and patient safety − not only within clinical trials, but in healthcare more widely.
AI is set to have wide-ranging impacts on how much EHR eSource data can be re-used in clinical trials, and a lot of attention is now focused on the next steps. But, it’s right to tread carefully. Using AI to unlock unstructured healthcare data is not as simple as writing an algorithm and letting it run. One of the biggest challenges is not writing an algorithm that can make sense of data, but accessing the right data in the first place, and ensuring that the data is handled correctly once it’s been identified. Held within unstructured notes in healthcare records, are often large amounts of identifiable data. Sophisticated processes need to be in place to ensure that any data transferred is regulatory-grade and is de-identified to be HIPAA- and GDPR-compliant.
So, to the question ‘Is AI the answer?’, we think the answer must be ‘yes’. But, when dealing with large volumes of sensitive data, the answer is never quite that simple. There is still much work to be done. But, as we’ve seen over the last year, AI moves quickly, so watch this space!
About IgniteData
IgniteData is transforming the future of clinical trials through its cloud-based Virtual Research Assistant, Archer. A system-agnostic solution, Archer brings modern interoperability between EHR and key research applications, such as EDC. Providing seamless, secure transfer of clinical data, Archer is the global EHR-to-EDC solution for modern clinical trials.
If you would like to get in touch, please use our contact form.