Improving digital healthcare solutions with data interoperability and large language models

Application deadline: 23 October 2024

Programme start date: January 2025

(!) Only students meeting the criteria for a UK home or EU-EEA pre/settled fee status are eligible for this scholarship.

Digital interoperability in healthcare is crucial for ensuring that various tools can seamlessly work with patient data without losing its meaning. This integration allows clinical algorithms to leverage harmonized data, generating new insights that benefit both patients and healthcare professionals. Although progress has been made in developing universal healthcare data languages, the widespread implementation of these standards faces significant challenges. These include the difficulty and high cost of upgrading outdated IT infrastructure and legacy systems.

Large Language Models (LLMs), such as OpenAI’s GPT models, Google’s Gemini, Meta’s Llama series as well as medical domain specific ones like Med-PaLM, are showing great promise in improving health data interoperability. LLMs can translate messy doctor’s notes and reports into standardized formats, unlocking valuable data trapped in unstructured forms. This not only improves research by allowing access to a wider pool of information, but also strengthens patient care by providing a more complete picture for doctors. LLMs can also be used to identify relationships between different data points within and across patients’ medical records.

More research is needed to fully understand how AI tools like LLMs and ontologies can improve health data interoperability. This research should also consider how differences between healthcare systems might affect the efficacy and efficiency of utilising LLMs in clinical decision making.

This PhD project focuses on enhancing data interoperability among diverse healthcare systems through the integration of ontologies, terminologies, and Large Language Models (LLMs). The research aims to leverage artificial intelligence (AI) to transform unstructured clinical notes into a standardized, machine-readable format and link this information to established disease classification systems. By improving the compatibility and understanding of healthcare data across different platforms, this project seeks to contribute to more effective digital healthcare solutions, deepen our understanding of human diseases, and ultimately lead to better patient outcomes. The potential impact of this work includes earlier disease diagnosis and aiding development of new treatments.

Project Benefits

This project is conducted in collaboration with Roche and features a strong industry engagement component. You will have an industry supervisor from Roche and academic supervisors from the University of Edinburgh. During your research programme you will have the opportunity to undertake research visits and internships with the company.

The student selected for this project will be part of the UKRI AI Centre for Doctoral Training in Biomedical Innovation and will enjoy all the benefits associated with the CDT programme.

Funding

This is a 4-year studentship, in partnership with Roche, covering tuition fees, stipend (this is £19,237 in 2024/25) and an individual budget for travel and research costs. There is also provision for sick leave, parental leave and disability allowance.
(!) Only students meeting the criteria for a UK home or EU-EEA pre/settled fee status are eligible for this scholarship.

Project Supervision

The project will be supervised by Prof Ian Simpson and Prof Honghan Wu. There will also be an industry supervisor from Roche.

Candidate Profile

Essential skills and knowledge

Proficiency in Python
Experience of applying machine learning techniques to real-world data
Experience of using Natural Language Processing (NLP) methods for the processing of unstructured text

Desirable skills and knowledge

Familiarity with biomedical and healthcare concepts.
Experience modelling medical knowledge using ontologies and other formal structures.
Familiarity with healthcare data standards and ontologies such as SNOMED CT and OMOP.