U-M researchers now have access to over 1 million deidentified radiology reports
AI & Digital Health Innovation has released deidentified radiology reports matching its repositories of chest X-ray and brain MRI images free for the U-M research community.
Contact:
Kate Murphy
Marketing Communications Specialist
AI & Digital Health Innovation
ANN ARBOR, MI – In its continued drive to enhance and accelerate research efforts focused on artificial intelligence (AI) in health at the University of Michigan, AI & Digital Health Innovation (AI&DHI) announces it has released over 1 million deidentified radiology reports available for free for members of the U-M research community.
The reports contain impressions, narratives and other detailed notes matching the 1.1 million chest X-ray and 100,000 brain MRI studies available through AI&DHI’s secure HIPAA-compliant Turbo environment.
To ensure the security of the information provided in the reports, AI&DHI uses a technique called layered text analysis to remove all details that could be personally identifying. Additionally, the dates of the corresponding medical encounters have been randomly shifted while their sequence and duration have been preserved. This method of deidentification enables researchers to link each report to its respective image(s) as well as to the rest of the Michigan Medicine patient data included in the deidentified Electronic Health Record (EHR) released through AI & Digital Health Innovation’s tools.
“I’m very excited for the release of the radiology reports, which has been a major initiative for our data team,” said Dr. Michael Sjoding, Associate Director of Research Implementation at AI&DHI and Associate Professor of Internal Medicine. “Being able to link the reports with the actual images will enable researchers to do new and exciting multi-modal research with text and images and significantly advances our data offerings.”
How Researchers are Using the Reports
The deidentified radiology reports are available to all members of the U-M research community, including staff and students. A team including Robert Langefeld, a PhD candidate in Biostatistics at the School of Public Heath, is currently using the reports to study how large language models (LLM) could enhance and streamline the process of gathering research data from clinical notes such as these.
“Unlike most clinical information in electronic health records, notes like this are often in a highly unstructured format, which makes it nearly impossible to read through large numbers of them—like what you would need for a study,” said Robert.
The team, which includes Omkar Nayak (Computer Science and Mathematics), Dr. Xiang Zhou (Biostatistics), and Dr. Matt Zawistowski (Biostatistics), found that LLMs could meticulously read through each note and reason about whether a particular diagnosis—such as a stroke or collapsed lung—had been reached within that note. By combining these results for each patient, the team found they could gather information about many individuals that would not have otherwise been possible. The team is now planning to extend this work further by incorporating examples provided by radiologists regarding how they make these determinations to help better guide the models and also keep the physician in the loop.
“I’m very excited for the release of the radiology reports, which has been a major initiative for our data team. Being able to link the reports with the actual images will enable researchers to do new and exciting multi-modal research with text and images and significantly advances our data offerings.”
Michael Sjoding, MD, MSc
Associate Director of Research Implementation, AI&DHI
Associate Professor, Internal Medicine
“Having access to these resources has been fantastic for us,” said Robert. “Beyond just having a large amount of real data to work with, we've found that the radiology reports help us round out both text data from visits and structured health records. This enables us to provide a more complete view of each individual to researchers and to even pick up on instances where the structured records don't match the narrative from the clinical notes. It's exciting to be able to finally connect these different sources on such a large scale and for so many individuals!”
Accessing the Reports
The deidentified radiology reports are stored on AI&DHI’s Turbo secure storage environment and can be analyzed for free with tools from AI&DHI’s Armis2 and Secure Enclave Services (SES) computing environments. Researchers who are interested in learning more about this data resource and how it can be leveraged are encouraged to visit the AI&DHI Data Documentation website and reach out to the AI&DHI Data Solutions team (AIDHI-Data-Solutions@umich.edu) to schedule a consultation.
The AI&DHI Data Solutions team notes that these reports are only just the beginning.
“We are beyond excited about making this data resource available to U-M researchers!” said Cinzia Smothers, Director of Data Solutions and Research Implementation at AI&DHI. “The team, led by our Data Operations Engineer Adrian Weyhing, has worked hard on this project over the past year and a half. We are also looking forward to releasing more complex deidentified clinical notes in the near future.”
About AI & Digital Health Innovation
AI & Digital Health Innovation (formerly Precision Health at U-M) is dedicated to empowering researchers at the University Michigan to change the future of digital healthcare. They work with multi-disciplinary teams of health providers, basic scientists, engineers, and administrators to tackle the most difficult research problems and help rapidly bring ideas to the bedside. For more information visit aidhi.umich.edu.