Electronic source data, or eSource, has become a cornerstone of modern clinical trial execution. By enabling data to be captured digitally at the point of care, eSource reduces transcription burden, improves accuracy, and accelerates timelines. To date, most progress has been driven by structured EHR-to-EDC workflows. This approach has delivered clear operational and regulatory benefits at scale.
Yet mounting evidence suggests this progress has reached a natural ceiling. More than 80 percent of healthcare data remains unstructured, embedded in clinical notes, diagnostic narratives, pathology reports, and imaging interpretations. In oncology and rare disease trials in particular, many of the most clinically meaningful variables never appear in structured fields at all.
This challenge is the focus of a new multi-stakeholder study, Unlocking Unstructured Health Data: Scaling eSource-Enabled Clinical Trials (Sundgren et al., 2025). Drawing on perspectives from research sites, sponsors, and technology leaders, the authors argue that the next phase of eSource adoption will depend on responsibly integrating unstructured data rather than simply expanding existing structured pipelines.
The study highlights advances in artificial intelligence and natural language processing as critical enablers. These technologies increasingly make it possible to extract, normalize, and contextualize data from free text and other non-tabular sources. However, the authors are clear that technical capability alone is insufficient. Regulatory-grade use requires rigorous validation, traceability back to the original source, and governance models that can scale across institutions with varying levels of digital maturity.
This pragmatic framing closely mirrors what leading research sites have experienced firsthand. As Joe Lengfellner, Chief Product Officer at Ignite Data and former Senior Director of Clinical Research Informatics at Memorial Sloan Kettering, explains:
“We’ve proven we can move structured data at scale. What’s exciting now is expanding that to unstructured data alongside data like labs and vitals.”
At Memorial Sloan Kettering, this next step began with a focused, high-value use case centered on performance status. ECOG and KPS scores are critical variables in oncology trials, yet they often live only within unstructured clinician notes. By applying large language models, the team showed how these data could be extracted from clinical documentation and delivered directly into EDC systems through Ignite Data Archer.
“One of the exciting projects we are working on with MSK is extracting performance status information—like ECOG and KPS—from clinical notes and pulling it directly into EDC.”
This approach aligns closely with the study’s emphasis on early, defensible use cases such as safety monitoring, medications, and imaging. These are domains where unstructured data can deliver immediate value while remaining compatible with regulatory expectations. Rather than treating unstructured data as an all-or-nothing leap, both the research and real-world implementations point toward incremental expansion grounded in validation and operational reality.
Crucially, Sundgren et al. position unstructured data not as a distant aspiration, but as a necessary evolution. Without it, eSource risks plateauing. With it, clinical trials move closer to becoming continuous learning systems that better reflect real-world care, reduce site burden, accelerate timelines, and ultimately deliver therapies to patients faster.
As Lengfellner notes, this evolution has implications far beyond technology:
“This is an important step toward increasing the percentage of trial data that can move electronically, without manual abstraction.”
For a detailed examination of the technical, regulatory, and operational pathways required to scale unstructured data in clinical trials, read the full article:
Sundgren M. et al., Unlocking Unstructured Health Data: Scaling eSource-Enabled Clinical Trials, 2025.