Projects for 2025 entry

Data-Driven Insights for Improving Patient Journeys in Unscheduled Care: A Comprehensive Analysis of Healthcare Services in Scotland

Primary supervisor: Syed Ahmar Shah

Second supervisor: Saturnino Luz Filho

External partner: Public Health Scotland

This project addresses a significant global challenge faced by healthcare systems: the need for enhanced efficiency in managing unscheduled care services. The COVID-19 pandemic has laid bare the vulnerabilities in healthcare infrastructures worldwide, highlighting the urgent necessity for robust mechanisms to balance routine and emergency care demands. Scotland serves as an exemplar in this context, responding to these challenges by leveraging advanced data-driven approaches and collaborative efforts.
In partnership with Public Health Scotland (PHS) and utilizing the Unscheduled Care Data Mart (UCD), this initiative aims to provide comprehensive insights into patient pathways across various care settings. The project focuses on three primary objectives: understanding current care pathways, identifying data gaps, and developing feedback mechanisms for continuous service improvement. By meticulously mapping patient journeys, we aim to uncover bottlenecks that affect system performance, such as waiting times and length of stay. This research will also explore the availability of existing datasets while addressing the need for enhanced data collection to facilitate effective service delivery. Importantly, we will tackle health inequalities to ensure that marginalized populations receive equitable access to care.
The student will participate in a dynamic research environment, dedicating 20% of their time (one day per week) to being embedded within PHS. This model allows the student to gain hands-on experience and insights into the operational challenges faced by the UCD team. The student will utilise descriptive statistics, data visualization, and machine learning techniques to develop predictive models that identify factors influencing patient outcomes and employ clustering methods to analyse common patient pathways.
Throughout the project, we will adhere to the principles of Responsible AI, ensuring that our approach is transparent, fair, and ethically sound. All data will be pseudo-anonymised, and participants will receive thorough training in data governance and patient confidentiality to uphold the highest ethical standards.
The expected outcomes of this research include actionable policy recommendations aimed at improving service delivery, enhancing data collection practices, and informing the redesign of urgent care services. Ultimately, this project aspires to contribute to a more resilient and responsive healthcare system, not just in Scotland but as a model for healthcare systems worldwide, significantly benefiting patient care and health outcomes.

Finding the Rhythm: Detection of Metabolic Events in Mental Health Conditions from Time Series Data

Primary supervisor: Karl Burgess

Second supervisor: Diego Oyarzun

External partner: Dynamic Therapeutics

The mental health crisis is a significant global concern; epidemiological data indicate escalating incidences of depression, anxiety, and other disorders that require urgent new approaches to diagnostics and treatment. This crisis is exacerbated by our limited understanding of the molecular processes driving mental health disorders. This project aims to characterise metabolic signatures of mental health disorders, using first-in-class hardware to monitor patient metabolic signals in tandem with machine learning to generate actionable insights for diagnostics and therapy.
A key aim of the project is to determine the impact of psychiatric disorders on circadian rhythms, and more specifically on circadian metabolism. Our project partners (Dynamic Therapeutics) are supplying the unique U-RHYTHM device and we are among the first users of the technology. U-RHYTHM uses a microdialysis probe to minimally invasively sample interstitial fluid from a patient for 24 hours per sampling cartridge, collecting samples every 20 minutes, with the possibility of utilising two cartridges for 48 hour sampling. The system is small and portable, and consequently users can participate in light exercise and sleep without significant impact from the sampling.
In this project, you will collect, for the first time, metabolomics data from patient samples, and analyse them with a combination of machine learning and signal processing techniques. No other technology has the capability to deliver quantitative information on hundreds of human metabolites at such high temporal resolution. It therefore represents a unique opportunity to apply advanced machine learning techniques to analyse rhythmic processes (day/night cycle, e.g. testosterone) and periodic events (coffee consumption, meals). The successful candidate will join the recently funded MRC Hub for Metabolic Psychiatry, of which the first supervisor is co-Investigator.
For the analysis of time-resolved datasets, the student will employ supervised learning models from machine learning, such as regression models, or deep learning approaches including time-series autoencoders, nonnegative matrix factorization and other techniques for dimensionality reduction and representation learning. The student will learn advanced metabolomics techniques, including mass spectrometry, chromatography, bioinformatics and analytical pipeline development, as well as machine learning capabilities.

Insights into Endometriosis Symptom Trajectories using Longitudinal Multimodal Data and Statistical Machine Learning

Primary supervisor: Thanasis Tsanas

Second supervisor: Andrew Horne

Third supervisor: Philippa Saunders

External partner: NHS & EXPPECT team

Endometriosis is a common gynaecological disease characterised by the presence of hormone-dependent endometrial-like tissue outside the uterus. It is estimated to affect 10% of women and those assigned female at birth, a prevalence that continues to rise as diagnostic approaches improve, with the condition often taking over 8 years to be diagnosed in the UK. Typical symptoms include chronic pelvic pain, menstrual cramps, painful sex and infertility, though it is now known that the condition can have systemic effects. Indeed, almost 95% of affected women experience at least one comorbidity, like migraine, depression, chronic fatigue or irritable bowel syndrome. There is a clear unmet clinical need since currently neither diagnostic nor therapeutic approaches to endometriosis are adequate.
Our team in Edinburgh, in collaboration with other leading centres in the UK and Europe, has been collecting longitudinal data from a large cohort of participants over the years and is involved in ongoing studies in collating self-reports, biological samples, and data from digital health technologies (in particular wearable sensors). Some of these patients have undergone different treatments (including surgical interventions), and we have unique data resources along with an ongoing data development wider programme towards objectively monitoring these outcomes.
This project will aim to capitalize on these rich data resources, mining the collected multimodal data to develop effective clinical decision support tools towards facilitating early diagnosis, symptom monitoring, and rehabilitation of endometriosis. The student will engage in time-series analysis, signal processing, and statistical machine learning algorithms to understand the underlying patterns in the datasets, and will also benefit from close collaboration with the EXPPECT team in Edinburgh, and our collaborators.
The project would be of particular interest to students interested in (health) data science, signal processing and statistical machine learning who want to develop decision support tools which will potentially directly inform healthcare decision making with potential direct translation of seeing these tools embedded within the NHS.

Causal Healthcare Analytics for Real-World Evidence with Targeted Learning: a cross-disciplinary, cross-sector approach

Primary supervisor: Sjoerd Beentjes

Second supervisor: Kenneth Baillie

Third supervisor: Ava Khamseh

External partner: National Institute for Health and Care Excellence (NICE)

The National Institute for Health and Care Excellence (NICE) provides guidance on best practices in health and social care, including public health, to the NHS in England and Wales which may be based on assessment of complex real-world evidence (RWE) for new health technologies and treatments. The use of real-world data (RWD), in addition to randomised controlled trials (RCTs), has also become increasingly popular for biomedical researchers, regulatory bodies and pharmaceutical industry. Extracting reliable RWE from RWD is challenging due to lack of treatment randomisation, intercurrent events, and informative loss to follow-up. NICE RWE framework aims “to use real-world data to resolve gaps in knowledge and drive forward access to innovations for patients.”
Targeted Learning (TL) is a sub-field of mathematical statistics and causal inference which offers an ideal step-by-step framework to address these challenges. TL has been widely and successfully applied with USA industry and the USA Food and Drug Administration (FDA). Building on our growing implementation of TL methods at UoE, we bring together external partners and end-users from NICE (Dr Stephen Duffield, Dr Michael Merchant), pharmaceutical industry (Di Zhang, Teva Pharmaceuticals, former FDA) and academia (Prof Mark van der Laan, UC Berkeley’s School of Public Health).
The methodological aim of this project is to develop and apply a series of novel semi-parametric estimators in the framework of TL, establish their theoretical properties, and establish their performance through realistic simulations and applications to real-world health data. The biomedical aim of this project includes application of the TL framework to (i) estimation of causal effects in scenarios with high-dimensional confounders which occur in observational healthcare data and genomic medicine, (ii) estimation of causal effects in the presence of non-ignorable missingness, common to routine healthcare data so essential for extracting valid RWE, and (iii) estimators that combine multiple outcomes to increase statistical power, such as multiple end-points in clinical trials or multiple combined traits in population genetics.

How do Different Ways of Making a Home Warmer Affect Risk of Preschool Respiratory Infections? Using Artificial Intelligence to Make Homes and Children Healthier

THE PROBLEM:
Acute respiratory infections (ARIs) are the main cause of hospitalisations in preschool children and increase the risk of asthma and premature adult death. ARIs are associated with underheated housing, but we do not know what proportion of preschool ARIs could be prevented by tackling underheated homes.
Half of Scotland’s housing has low energy efficiency (hard to heat), producing significant carbon emissions. The Scottish Government is retrofitting existing housing to increase home energy efficiency (HEE) for Net Zero targets.
HEE measures work by: A) reducing energy costs without increasing airtightness (e.g., boiler replacement) or B) reducing heat loss by increasing airtightness (e.g., insulation). Group B measures retain warm air but may increase ARI risk by trapping indoor air pollutants (e.g., smoke, particulate matter, and viruses) and increasing mould risk.

NEW DATA OPPORTUNITIES:
This PhD is embedded in the Homes, Heat and Healthy Kids study, a major new Scottish interdisciplinary study with five years Wellcome Trust funding. New Scottish data developments have linked healthcare data to individual homes for the first time. This interdisciplinary PhD will use a new national retrospective birth cohort linking electronic health records with data on home energy efficiency and energy use, smart meters, high street banking, air pollution and climate to investigate the impact of different HEE measures on preschool ARI risk.
METHODS:
This project will use spatio-temporal machine learning models to group children into healthcare trajectory groups after different HEE measures and explore whether particular characteristics (e.g., deprivation, climate or air pollution) are associated with each trajectory. This will enable identification of groups of children for whom a specific HEE measure may be beneficial or detrimental, informing policy makers of ways to stratify HEE measures to different populations for maximum benefit.
ADDITIONAL TRAINING
Strong links with our parent partners (PPI group), engaged policy partners and data partners will provide exciting opportunities for community engagement, working with policy makers and exploring newly available datasets (including smart meter and data and high street banking data).
IMPACT:
• Informing national energy, social and building policies to prevent preschool ARIs.
• Identifying ways to stratify HEE measures across different populations.
• Methodology applicable to other health conditions, demographics and populations.
SKILLS
• Spatio-temporal machine learning trajectory modelling
• Policy communication skills
• Bidirectional learning from PPI group
• Building broad interdisciplinary network

AI-driven Continuous Physiological Monitoring to Predict Deterioration following Surgery

Primary supervisor: Ewen Harrison

Second supervisor: Annemarie Docherty

Third supervisor: Catherine Shaw

External partner: Sibel Health

This PhD research project aims to address a critical issue in postoperative care—failure to detect and intervene in complications that lead to preventable morbidity and mortality. These complications, such as bleeding, sepsis, and myocardial injury, often result in delayed intervention due to the current limitations in monitoring technologies and systems. Continuous postoperative monitoring, including with wearable sensors, offers a potential solution by providing real-time, multimodal physiological data (e.g., heart rate, respiratory rate, oxygen saturation, skin temperature, etc.) to alert healthcare teams of deterioration promptly.
Despite the promise of wearable technologies, no validated algorithm currently exists to reliably predict patient outcomes based on this data. Previous studies in this field have been limited by small sample sizes and narrow methodologies, and there are no large-scale efforts underway to assess the utility of such multimodal monitoring in the postoperative period. This PhD project will address these gaps by developing predictive AI models that incorporate continuous waveform data, patient history, and other clinical variables to predict adverse events after surgery.
Building on the foundations laid by the EMUs (Enhanced Monitoring Using Sensors after Surgery) and ICU Heart studies, this project will use a unique, real-time dataset collected from patients undergoing major surgery in various countries. The aim is to develop advanced machine learning models, such as random forests and neural networks, and explore attention mechanisms and multitask learning. These models will predict multiple complications simultaneously, a highly attractive feature from a clinical perspective.
The study will have access to a large, unique dataset, ensuring the robustness of the predictive models. The PhD will also involve collaborations with Sibel Health, an industry partner, and ethical considerations such as bias, privacy, and transparency will be closely monitored. The outcome of this research is expected to have a profound impact, potentially reducing preventable deaths and complications in postoperative care worldwide, with relevance in both high-resource and low-resource settings.

Operationalizable Clinical Risk Predictions using Machine Learning-driven Multi-state Models

Primary supervisor: Sohan Seth

Second supervisor: Bruce Guthrie

Third supervisor: Ewen Harrison

External partner: TBC

Better prediction of individual prognosis (e.g., mortality, hospital admission/readmission and institutionalization), allows targeted early intervention potentially leading to a better quality of life and economic benefit. The availability of electronic healthcare records facilitates this by providing unprecedented detail of the longitudinal health trajectory of an individual. However, more sophisticated methods are needed to deal with the incompleteness of the data, the existence of multiple competing outcomes, and the dynamic relationship between covariates and outcomes. Furthermore, these models should be explainable and operationalizable to ensure their clinical translation. This project aims to develop machine learning-based multi-state models to predict multiple outcomes simultaneously and assess their accuracy, efficiency, stability, adaptability, transferability, and capacity to quantify uncertainty. The project will be using longitudinal survey data from the English Longitudinal Study of Ageing, longitudinal routine healthcare data from the SAIL Databank and the critical care dataset from the ISARIC4C study.

AI in early-stage screening and monitoring of neuro-degenerative diseases by Raman spectroscopy

Around 850,000 people in the UK live with neurodegenerative diseases. Costs of dementia care in the UK (comprising NHS, social care and unpaid care costs) were estimated as £34.7billion in 2019, rising to £94.1billion in 2040. Early, accurate diagnosis allows for earlier patient intervention and reduced cost, notably via the use of therapeutics which slow the rate of progression.
Raman spectroscopy uses a laser to excite vibrations within molecules, which occur at specific frequencies. By detecting scattered photons with a spectrometer we can deduce the energy loss, and type of bond excited within the molecule. Cells and biological tissues contain complex mixtures of biomolecules, and subtle differences in chemical composition caused by a disease are revealed with Raman spectroscopy. We have achieved diagnostic accuracies of 96% for Huntington’s disease (brain and skin tissue), and >97% for stage I breast cancer (blood plasma), and want to improve the accuracy levels other groups have measured for Alzheimer’s in blood plasma (84%) using our sophisticated pre-processing techniques, and improve our AI techniques for classifying samples correctly as ‘disease’ or ‘healthy’.
In collaboration with the project partners (NHS Research Scotland Neuroprogressive and Dementia Network) we will approach newly diagnosed patients and acquire urine and blood samples. The student will be involved in the ethics process and patient involvement, and will use Raman spectroscopy on these samples.
Spectral pre-processing techniques will be improved and optimised, then machine learning techniques applied to classify samples as healthy or diseased. Beyond a test for a single disease, we will combine all diseases into a single test to determine overall accuracy, and determine how accurately diseases can be distinguished.
We also plan to monitor patients with Alzheimer’s over time, from initial diagnosis over the course of the disease during the project. This will allow the clinician to monitor the patient better, and determine whether the patient response to treatments is slowing disease progression.

Developing LLM Agents for Resilient, Efficient, and Ethical Capacity Modelling in Health Care Provision

Primary supervisor: Fengxiang He

Second supervisor: Filippo Menolascina

External partner: Brigham and Women’s Hospital

Capacity modelling in health care provision is a systematic problem to assessing, planning, and optimising the allocation of resources, such as staff, equipment, and facilities, to meet patient demand and improve the resilience, efficiency, and ethics of health care services. It has been proved crucial, especially during pandemic.
Many objectives need to be considered in capacity modelling, such as service delivery efficiency, patient outcomes, and resilience, alongside many constraints, including limited resources, fluctuating patient demand, regulatory requirements, and ethics. It is an open problem to achieve the Pareto optimal solution of this multi-objective, constrained problem.
To address this issue, this project proposes to design an AI agent based on large language models (LLMs) for the capacity modelling problem. Our objectives include (1) designing model architectures and optimisation algorithms of LLM agents for capacity modelling in health care provision, (2) establishing mathematically guarantees of this LLM agent’s generalisability, stability, algorithmic fairness, etc., and (3) developing a prototype system employing the developed algorithms for real-world applications.

Stratifying Cancer Treatment Responses in Mesothelioma with AI-driven Bioimaging

Primary supervisor: Carsten Gram Hansen

Second supervisor: Yunjie Yang

External partner: NHS

Cancers, as a result of distinct mutations, lead to the expansion of clonal cell populations that exhibit considerable heterogeneity in epigenetic, physical and transcriptome profiles. This pathogenic capacity enables cancer cell survival under nutrient-poor conditions, infiltration, metastasis and combined poses significant treatment challenges, particularly due to the emergence of therapy-resistant cell subpopulations. Therefore, an ability to analyse and predict cellular responses to therapeutics early, and in rare cell populations would be transformative.
Our project aims to identify precisely why, and how cancer cells differentiate from healthy cells by integrating Artificial Intelligence (AI) with cutting-edge bioimaging modalities. We will focus on pleural mesothelioma, a deadly cancer caused by asbestos exposure, characterised by mutations in key tumour suppressors. Using state-of-the-art label-free imaging platforms (Nanolive and Livecyte), alongside confocal high content imaging (Opera Phenix Plus), we will perform high-resolution, multi-parametric analysis to understand the unique cellular dynamics and predict patient responses. This approach will leverage our recently developed isogenic cellular model that closely represents the disease, facilitating the development of personalised treatment strategies. Through continuous refinement and validation of our AI-driven model against a panel of promising drug candidates, this project aims to provide ground-breaking improvements in diagnosing and treating mesothelioma, potentially extending to other cancers with similar genetic disruptions. We anticipate that this ambitious project will enhance opportunities for stratification and early diagnosis, and contribute to the development of future therapeutic strategies.

Novel Imaging Features in OCTs and/or Statistical Data that Predict Visual Outcome after Macular Hole Surgery and that can be used to Inform Clinical Decision Making

Primary supervisor: Heather Yorston

Second supervisor: Stuart King

External partner: NHS

Idiopathic full-thickness macular holes (MHs) form secondary to age-related abnormalities of the vitreoretinal interface with a prevalence of up to 3 in 1000 people over the age of 55. They appear as a small dehiscence in the neurosensory retina at the centre of the fovea, a highly specialised part of the human retina responsible for fine acuity and colour vision. Spectral-domain (SD) optical coherence tomography (OCT) imaging allows ophthalmologists to diagnose, classify and measure MHs. OCT is a non-invasive, high-resolution imaging technique that uses infrared light to image the retina in 3D.
Macular holes can be effectively treated by closing the hole using vitrectomy surgery. They are one of the commonest indications for vitrectomy surgery accounting for ~4000 surgeries in UK and more than 200,000 globally per annum. Predicting the visual outcome after surgery is important to guide the decision to operate and manage patients’ expectations, as well providing insight into their pathology. Several studies have shown that postoperative visual acuity (VA) is correlated with a variety of measures of macular hole size that can be measured on SD-OCT. Various studies have attempted to precisely predict postoperative VA using manual 2D measurements of MHs and preoperative VA, although their predictive ability has been limited. Three-dimensional automated image reconstruction has improved this ability, but there are no current standards for shape, size, and resolution of OCT imaging data captured by different OCT devices for this task. There are also many qualitative features and subtle alterations in retinal anatomy, for example, associated with chronicity, which may be predictive of acuity outcomes and that are difficult to measure. Additionally, image artefacts related to a patient’s eye movement and media opacity pose a further challenge in developing image informatics methods.
Most existing machine learning (ML) and deep learning (DL) approaches have focused on the automated classification of macular diseases, such as age-related macular degeneration (AMD), diabetic macular oedema (DME), and MHs from OCT image data. More recently, some DL approaches have attempted to improve the prediction of VA outcomes using OCT data although these have been very limited and mainly in diseases other than MH. This project will try novel ML/DL approaches guided by clinical knowledge to improve the prediction of VA outcome for patients.

Explainable and Transparent AI models for Glioma Diagnosis from Brain MRI

Primary supervisor: Ajitha Rajan

Second supervisor: Paul Brennan

External partner: NHS Lothian

Gliomas are the most common and fatal malignant brain tumours in adults. Diagnosis and treatment response evaluation in patients with gliomas are still highly dependent on neuroimaging. AI-based deep learning (DL) models have demonstrated transformative potential in medical diagnostics, including glioma diagnosis from brain MRI.
The black-box nature of modern deep learning (DL) models makes it challenging to trust and understand the rationale behind their decisions, especially in high-stakes domains such as Glioma. Explainable artificial intelligence (XAI) techniques aim to increase the trustworthiness and transparency of a DL model’s decision-making process by providing accessible interpretations. The project aims to develop a new transparent and explainable AI technique tailored to explain Glioma diagnosis from Brain MRI.
Existing techniques have focussed on using techniques like GRAD CAM, saliency maps and perturbation-based techniques like LIME, SHAP and Occlusion sensitivity. However, these techniques fail to capture clinical concepts in their explanations. The technique designed in this project will aim to bridge this gap.
The student will work closely with a clinician to understand what concepts to use in explanations. The accuracy of the designed AI model and the explanations will be compared against state-of-the-art techniques and evaluated by clinicians. Additionally, the model will be assessed for robustness and generalised to different MRI modalities and device settings in hospitals.

Developing an Augmented Blood Flow Tool for the Diagnosis and Treatment of Congenital Heart Diseases using Echocardiograms and Machine Learning

Primary supervisor: Benjamin Owen

Second supervisor: Joseph O’Connor

External partner: Red Cross Children’s Hospital Cape Town

Congenital heart disease (CHD) is the most common birth defect, affecting over 1.3 million newborns each year. In high-income countries, medical advancements mean that 90% of CHD patients survive into adulthood. However, in lower and middle-income countries (LMICs), CHD causes over 200,000 deaths annually, with around 90% of patients lacking access to proper care. Diagnosis and monitoring of CHD often rely on echocardiograms, which use ultrasound to measure blood flow. While it works well for some parts of the heart, blood flow in the aortic arch—a key area for diagnosing issues like aortic coarctation (narrowing of the aorta)—is hard to measure due to bones obstructing the view. Complete and accurate blood flow data would help doctors make better treatment decisions, improving patient outcomes.
Currently, methods like Computational Fluid Dynamics (CFD) and 4D MRI can provide detailed blood flow data, but they are either too expensive, time-consuming, or unsuitable for use in LMICs, especially for infants who cannot stay still for long MRI scans. This project aims to overcome these challenges by combining echocardiogram data with machine learning to estimate blood flow in the aorta, allowing clinicians to make more informed decisions at the point of care. The machine learning model will be trained using simulated blood flow data, allowing it to fill in gaps left by the echocardiogram’s limited measurements.
The goal is to create a tool that gives clinicians a complete picture of blood flow in the aortic arch without the need for costly, time-intensive methods. The project will use patient data from an open-source repository and the Red Cross Children’s Hospital in Cape Town, South Africa, where collaboration with pediatric cardiologist Professor Liesl Zühkle will help validate the tool. Initially, the tool will be applied to patients with conditions like aortic coarctation, improving the accuracy of assessments and potentially reducing the need for additional tests like CT scans.
In the long term, the tool could help improve care for patients in LMICs and be adapted for other diseases assessed via ultrasound. This could lead to more accessible and affordable heart disease treatment globally, especially in regions with limited medical resources.

The student working on this project will have the opportunity to visit the Red Cross Children’s Hospital, Cape Town, to understand their clinical procedure and the constraints they face.

AI-based Assessment and Validation of Brain Mineral Deposition in its Different Forms Detected from Routine Clinical Brain Magnetic Resonance Images

Iron is essential for maintaining a healthy body function. But an excess of it can lead to oxidative stress damage to biomolecules, as well as cellular dysfunction causing cell death. This process is apparent with increasing age, where iron gets accumulated in the brain, leading to cognitive decline and increasing the risks of neurodegenerative diseases. Iron-containing macromolecules can aggregate forming calcified clusters, and are visible using magnetic resonance images (MRI).
While work has been done in the development of MRI sequences that allow quantifying these mineral accumulations, work is needed to assess this mineralisation process from conventional MRI for prognostic purposes and for identifying pre-clinical stages of neurodegenerative diseases like Alzheimer’s and Parkinson’s diseases. Moreover, given the ferromagnetic (iron) vs. paramagnetic (calcium) nature that these deposits have in different proportions throughout the brain -which varies according to the underlying disease-, the validation of the accuracy of the assessment methods from MRI requires the use of complementary methods that may range from the development of physical MRI phantoms to the analyses of histological images (tissue samples) or body fluids (blood).
This project will develop a method for segmenting brain iron and calcium accumulation throughout the whole brain and in its different forms (tissue deposition, brain microbleeds, superficial siderosis, and haemorrhagic transformations from ischaemic lesions) using AI, in a large sample of MRI images acquired from different patient groups, assess the degree of mineral accumulation in the areas segmented offering a proxy for insoluble iron/calcium concentration and degree of aggregation (i.e., clustering) in different subregions (also using AI methods), and validate the AI-based imaging computational assessments using complementary biomedical analysis methods in a sample of individuals with both brain MRI and tissue samples.

AI for Discovering Affordable Therapies against Neglected Tropical Diseases

Primary supervisor: Diego Oyarzun

Second supervisor: Shay Cohen

External partner: Drugs for Neglected Diseases initiative (DNDi)

Neglected Tropical Diseases (NTDs) are a diverse group of infectious diseases that primarily affect populations in low-income regions, particularly in tropical and subtropical climates. These diseases disproportionately affect poor and marginalized communities with limited access to adequate healthcare, sanitation, and clean water. Despite their significant impact on global health, these diseases often receive little attention from the pharmaceutical industry due to their unprofitability and lack of financial incentives for research and development for new therapies.
This project aims to address the urgent need for affordable therapies against NTDs, focusing specifically on Chagas disease and in collaboration with the Drugs for Neglected Diseases initiative (DNDi). Chagas is a life-threatening illness caused by the protozoan parasite Trypanosoma cruzi. It primarily affects people in poor areas of Latin America, and increased population migration have carried Chagas disease to new regions. The acute phase occurs shortly after infection and may exhibit mild symptoms or go unnoticed. However, if left untreated, the infection progresses to the chronic phase, which entails severe damage to the heart, digestive system, and other organs. The impact of Chagas disease on global health is significant, with an estimated 6-7 million people infected worldwide and approximately 10,000 deaths annually. There are only two drugs currently approved for use, but these have significant limitations, including high cost, lengthy treatment, and potential side effects.
In this project you will build a machine learning pipeline to identify chemical compounds with therapeutic potential against Chagas, and use this information to design compound libraries for experimental screening by DNDi. Ultimately, we aim to discover and validate chemical structures that can be progressed through the drug discovery pipeline. The approach involves training machine learning classifiers of drug action using ensemble models and graph neural networks, as well as chemical large language models (LLMs) to screen compound libraries. We will utilize in-house screening data from DNDi, incorporating nearly 900,000 chemical structures with various readouts of drug effect. The size and coverage of the dataset makes it particularly suited for trialling cutting-edge deep learning and LLM tools on real-world data, with the aim of solving important health challenges in low-income regions of the planet.
The project is part of an exciting partnership with DNDi, a leading not-for-profit research organization developing new treatments for neglected patients. DNDis mission is to discover, develop, and deliver new treatments for neglected patients around the world that are affordable and patient-friendly. Work by DNDi has already saved millions of lives through the development of affordable therapies for malaria, sleeping sickness and other tropical diseases of unmet need.
The successful candidate will join the Biomolecular Control Group, a diverse, international, and multidisciplinary team working at the cutting-edge of computational modelling applied to biological questions. As a team, we gather expertise in many computational and mathematical methods, including machine learning, dynamical systems, optimization, and network theory.
Our team ethos is based on mutual learning, strong peer-to-peer support, and a drive to support the career growth of our members. We offer multiple opportunities for networking and skills development, for example through guidance and co-creation of student research projects.

Learning the Rules of Multicellular Self-Organisation with Interpretable Machine Learning

Primary supervisor: Linus Schumacher

Second supervisor: Guillaume Blin

External partner: TBC, please check with the primary supervisor

Do you want to advance interpretable machine learning tools for biomedical science, develop a framework to guide biomedical engineering of synthetic tissues, and help the production of specific cell types for regenerative medicine? This project will develop an interpretable machine learning framework to infer cell fate patterning mechanisms from sequences of microscopy images, and to use this framework to identify evolutionary design principles underlying robust cell fate patterning during tissue development.
A hallmark of living systems is their ability to self-organise in a manner that is both robust and evolvable. Robustness provides resistance to genetic and environmental variability, while evolvability maximises phenotypic innovations and ensures adaptability. Robustness and evolvability appear to be opposing characteristics, and how biological systems combine these two properties is not understood. One possibility is that there exists a set of motifs within molecular and cellular interaction networks that explain how developmental systems achieve both robustness and evolvability at the same time. Identifying and gaining a quantitative understanding of these motifs would enable a breakthrough to rationalise our strategies to control, manipulate, and repair biological systems.
On this project you will use innovative machine learning frameworks, Neural Cellular Automata (NCA) [Richardson et al. 2024] and/or neural PDEs, to (1) develop its interpretability as mechanistic mathematical models, (2) analyse in-house experimental data from synthetic embryology [Robles Garcia et al. 2023], (3) quantify evolvability of molecular and cellular interaction networks through in-silico evolution of trained models and information-theoretic quantification of patterning and self-organisation. Your responsibilities will include developing code and designing workflows, understanding the underlying mathematics, and collaborating with experimental biologists to analyse existing data as well as designing new experiments based on model predictions. This project would suit a student with quantitative and computational skills interested in developing interpretable AI methods as well as solving biological problems.

Integration of Property Predictions with Molecule Generation using Reinforcement Learning for Fragment-based Drug Design

Primary supervisor: Antonia Mey

Second supervisor: Amos Storkey

External partner: XChem Beamline at Diamond Light Source

Fragment-Based Drug Design (FBDD) offers a promising pathway in drug discovery, particularly for challenging biological targets that are typically deemed intractable by high-throughput screening methods. This approach involves the utilization of small chemical fragments, which serve as the foundational building blocks in the synthesis of more complex drug-like molecules. The primary advantage of FBDD lies in its efficiency – it requires the screening of fewer compounds compared to traditional methods. Historically, the identification and optimization of these fragments into potential drugs require intensive experimentation and computational support.
The XChem beamline at the Diamond Light Source, provides throughput screening facilities for FBDD X-ray data. The primary objective of this proposal is to develop and integrate advanced computational models that can rapidly and efficiently process fragment data from XChem, transforming these into viable drug candidates through guided molecular generation.
The goal is to combine modern machine learning methods with X-ray fragment data to generate reliable new drug-like molecules with desired properties, such as a high binding affinity to its drug-target, e.g., a protein. The project will explore strategies for molecule generation, and protein-drug property prediction and explore new reinforcement learning strategies to elaborate fragments to drug candidates efficiently.

Addressing Patient Mortality in Hemodialysis via AI Applied to Metabolomics and Material Science

Primary supervisor: Grazia De Angelis

Second supervisor: Karl Burgess

External partner: Kidney Research UK

Patients undergoing hemodialysis (HD) exhibit significantly higher mortality rates compared to those who had kidney transplants. This disparity is largely attributed to the accumulation of uremic toxins that standard HD treatments fail to completely remove. Despite this acknowledged issue, systematic identification of specific uremic toxins impacting mortality in patients receiving maintenance HD has not been effectively addressed.
This project integrates AI, metabolomics, and biomedical materials science to accelerate the identification of key metabolites and biological pathways involved in the mortality of dialysis patients and to discover biocompatible filtering materials that could enhance HD efficacy in toxin removal.
By leveraging data from existing literature and collaborations, this synergistic approach seeks to elucidate the mechanisms behind elevated mortality in HD patients and develop solutions to mitigate these risks, with the ultimate goal of reducing patient mortality.

Genetic Regulation of Antibiotic Resistance in the Major Pathogen Klebsiella Pneumoniae

Primary supervisor: Andrea Weisse

Second supervisor: Thamarai Dorai-Schneiders

External partner: TBC

Antibiotic resistance poses a severe threat to human, animal and planetary health. It means that common infections are becoming harder, and at times, impossible to treat. Already, resistance is associated with almost 5 million annual deaths globally, with numbers expected to rise rapidly as resistance spreads.
Resistance typically arises through genetic mutations or through gene acquisition that enable bacteria to resist antibiotics. In contrast to acquired resistance, transcription factors form part of the intrinsic response to antibiotic challenge and when upregulated control multiple genes that can impact bacterial susceptibility to antibiotics.
The global regulatory transcription factor RamA controls a multitude of drug and immune responses in the pathogen Klebsiella pneumoniae, which causes severe infections, particularly, in vulnerable hospital patients. In previous work, we showed that RamA contributes to the systemic dissemination of Klebsiella pneumoniae in mouse models of infection – thus highlighting its role as a key factor in pathogenesis and antibiotic resistance.
Here, we set out to quantitatively dissect mechanisms that promote RamA-mediated resistance. We will integrate public, clinical and our own lab data to quantitatively study how regulation via RamA differentially adapts the gene expression machinery to antibiotic challenge. With RamA upregulated in strains resistant to last-line antibiotics, it presents as a promising target for the development of novel treatments, and so this work will establish a mechanistic base to investigate viable strategies.

Engineering an Imputation Panel for the Human Proteome

Primary supervisor: Riccardo Marioni

Second supervisor: Sarah Harris

External partner: Optima Partners

Protein measurements from blood samples can offer insights into future risk of disease. This applies to diseases of both the body and the brain, including dementia. However, there are currently multiple technologies and methods to measure the proteome. These began by assessing <100 proteins per panel but can now capture in excess of 11,000 proteins. A complex correlation structure exists between proteins, many of which belong to the same functional pathways and processes. Here, we will use data from UK Biobank (3,000 proteins measured in 50,000 people) to determine if imputing proteins from other proteins: 1) helps to de-noise the signal from an individual protein, leading to stronger, more significant associations with disease outcomes, and 2) can be used to boost content from cohorts where smaller subsets of the proteome have been assessed e.g., can we derive a parsimonious panel of proteins that is inexpensive to measure and which offers little loss of information compared to large, expensive panels?
By using a variety of statistical ML and AI approaches, we will create a robust imputation process for the proteome that will be implemented via an R package or front-end server. This could enable the global research community to save £millions from generating new array data and lead to new discoveries in association studies within biomedical research.

AI and the Dark Genome: Using Protein-DNA Structure Modelling and Genomic Language Models to Predict the Impacts of Non-Coding Genetic Variation

Primary supervisor: Joe Marsh

Second supervisor: Simon Biddie

External partner: NHS Lothian

Variation in gene expression gives rise to phenotypic diversity and influences disease risk. This expression is regulated by DNA regions called regulatory elements, found in the poorly understood non-coding regions known as the Dark Genome. Regulatory elements, like promoters and enhancers, host DNA-binding proteins such as transcription factors, which are crucial for controlling cell identity, development, and environmental responses. Disruption in these regulatory processes can lead to disease.
Nucleotide variants in regulatory elements can alter the binding of DNA-binding factors and affect gene expression. While genome-wide association studies (GWAS) often link non-coding variants to common diseases, they struggle to pinpoint functional variants and their mechanisms, limiting clinical translation. Identifying functional non-coding variants remains a significant challenge in genomics. Experimental methods to identify functional non-coding variants exist but are limited in scope, expensive, and labour-intensive. Therefore, there is a growing need for computational approaches to improve accuracy in identifying these variants. For coding variants, computational methods have advanced by using evolutionary and protein structural information, but these strategies are less effective for non-coding regions.
In recent years, there have been incredible advances in the computational prediction of biomolecular structures. Notably, the recent release of AlphaFold3 enables the modelling of high quality structures of protein:DNA complexes, which could dramatically expand the scope of structure-based variant effect prediction for non-coding regulatory variants. Moreover, advances in genomic large language models, taking advantage of the huge amount of genome sequence data now available, have the potential to overcome the limitations associated with sequence alignment-based approaches in non-coding regions.
This project aims to advance computational prediction of functional non-coding variants by integrating state-of-the-art methods in biomolecular structure prediction, molecular simulations, and genomic language models. The student will use experimental datasets, both published and those we have generated, to compare the performance of approaches based on mechanistic, structure-based modelling of protein:DNA binding to traditional conservation-based metrics and genomic language modelling approaches. These approaches will then be applied to druggable and clinically relevant DNA-binding factors, cell types of clinical relevance, and diseases with high clinical need, at whole-genome and population scale.
The student undertaking this project will gain expertise in:
1. Computational prediction of biomolecular structures and molecular modelling impacts of variants on binding
2. Sequence-based variant effect prediction methodologies and genomic language models
3. Bioinformatic analyses in functional genomics including DNA-binding factors, chromatin and gene expression biology
4. Complex trait genetics applied to common diseases.

Identification of Cancer Cell States Associated with Patient Survival at High Resolution via Combinatorial Gene Expression Dependencies

Primary supervisor: Yi Feng

Second supervisor: Ava Khamseh

External partner: Genenet Technology (uk) Limited

To design more effective cancer prevention strategies and improve prognostic accuracy, we need a deeper understanding of the mechanisms driving tumour initiation and the evolution of both cancer cells and their surrounding microenvironment. Previously, we have demonstrated that cancer cell states associated with poor prognosis can be identified at high resolution using single cell RNA-sequencing (scRNAseq) data from established tumours, via structure learning and quantification of higher-order gene expression dependencies.
Recently, we conducted a time-resolved scRNAseqing analysis of preneoplastic cells (PNCs) and associated innate immune cells within 24 hours of oncogene activation in a zebrafish model. This led to the identification of an EMT/CSC-like PNC cluster, with its marker genes predicting poor prognosis in several carcinomas in the TCGA database. Furthermore, this cluster appears to drive tumour-promoting neutrophil development, suggesting its critical role in malignant progression. In the current project, using our zebrafish dataset as the initial state of tumour initiation, we aim to map cellular states onto single-cell RNAseq and spatial transcriptomic datasets from a wider range of cancer patients and mammalian models.
We will employ our recently developed computational tool, Stator, and test publicly available data integration tools such as SATURN to investigate how the EMT/CSC-like state, identified at the onset of oncogene activation, evolves throughout cancer development and how host immune cells might co-evolve in this process. Our goal is to uncover the mechanisms that either promote or suppress the EMT/CSC state, thereby identifying novel targets for cancer prevention. We will seek to discover potential biomarkers to enhance early detection and improve prognosis. Additionally, we will develop a software and user-friendly visualisation tool for identification of cancer cell states associated with prognosis from scRNA-seq.

Multimodal Prediction of Immunotherapy Treatment Outcomes for Renal Cell Carcinoma

Primary supervisor: Ajitha Rajan

Second supervisor: Mark Stares

External partner: Francis Crick Institute

Renal cell carcinoma is a type of kidney cancer that starts from the kidneys. It is the 8th most common cancer in the UK and an increase of new cases of 2% has been seen in the last twenty years. Immunotherapy is a type of cancer treatment that ‘wakes up’ the patient’s own immune system so it can fight the cancer. New drugs which act in this way have worked well in patients with skin cancer (melanoma), lung cancer and in patients with kidney cancer that has spread outside the kidney. Nevertheless, most patients do not achieve meaningful benefit from immunotherapy, due to primary or acquired resistance, while immune-related adverse events can limit the use and effectiveness of Immune Checkpoint Inhibitors and negatively impact quality of life and survivorship. Biomarkers that can predict response, resistance and immune-related adverse events are an unmet clinical need with profound implications on health resources and therapeutic outcomes.
The PhD project will produce novel AI innovations in multimodal data integration and modelling that allows biomarker discovery, predicting treatment outcomes and toxicities for standard care and emerging immunotherapies using in depth patient profiles that include molecular profiling, immune profiling, spatial image profiling. The PhD research will aim to understand the role of each of these modalities for immunotherapy response and combine the modalities to discover biomarkers and predict treatment outcomes. The research will also validate and explain the results of the model using explainable AI techniques.
The PhD student will work closely with researchers and clinicians in the Manifest project, led by Francis Crick Institute, that includes 6 NHS trusts, 14 academic institutes and Universities, and several industry partners, including Roche-Sequencing, M:M Bio, IMU Biosciences. This collaboration will help generate clinical and industry impact.

Artificial Intelligence EEG Biomarkers for Neurodevelopmental Disorders

Primary supervisor: Alfredo Gonzalez-Sulser

Second supervisor: Javier Escudero Rodriguez

External Partner: Neuronostics

Background: Neurodevelopmental disorders (NDDs) are characterised by cognitive, motor, and sensory deficits that appear in early childhood1. NDDs including severe autism, intellectual disability, and epilepsy often co-occur in patients, while current therapeutic interventions are very limited. Recent breakthroughs through large-scale transcriptomic studies identified hundreds of mutations linked with NDDs2. New medical interventions are in development but quantitative biomarkers able to measure the efficacy of novel treatments for these genetic disorders are desperately needed. A prime candidate imaging technique for clinically relevant biomarkers is EEG, as it can be quickly performed and is relatively inexpensive3. However, robust EEG biomarkers utilizing conventional signal processing techniques have not been identified. For this PhD project we propose that artificial intelligence techniques such as supervised machine learning and deep learning may identify effective generalizable EEG biomarkers for NDDs.

Aims: The primary supervisors, Alfredo Gonzalez-Sulser from SIDB and Javier Escudero from The School of Engineering at The University of Edinburgh, have found that supervised machine-learning can effectively segregate SYNGAP1 genetic NDD EEG from control data in rodent models and humans (manuscript in preparation). We will test whether EEG artificial intelligence biomarkers can be expanded across genetic NDDs through the following aims:
1. Application of supervised machine-learning across EEG data from multiple rodent models and patient NDD datasets including SYNGAP1, GRIN2B, NLGN3 and SCN2A.
2. Determination whether deep learning unsupervised approaches are more effective to develop quantitative EEG biomarkers.

Rationale & hypothesis: At SIDB, we generated new transgenic rat models where the primary genes associated with neurodevelopmental disorders have been knocked-out. Gonzalez-Sulser and Escudero identified abnormal spectral and connectivity EEG properties in multiple SIDB rat models4,5. Supervised machine-learning can effectively segregate EEG from rodent and humans with SYNGAP1 NDD from control data (manuscript in preparation). We hypothesize that artificial intelligence techniques applied to EEG can be expanded and adapted to effectively use as a biomarker across NDDs.

Training outcomes:
• Expertise in supervised learning and deep learning data science techniques.
• Knowledge in the biological mechanisms generating EEG signals in the brain.
• Advanced knowhow in signal processing analyses utilized to study EEG.
• Industry experience through internship with EEG diagnostic company Neuronostics.

References:
1. Paulsen, et al., Autism genes converge on asynchronous development of shared neuron classes. Nature 2022.
2. Satterstrom, et al., Large-Scale Exome Sequencing Study Implicates Both Developmental and Functional Changes in the Neurobiology of Autism. Cell 2020.
3. Bosl, et al., EEG Analytics for Early Detection of Autism Spectrum Disorder: A data-driven approach. Scientific Reports 2018.
4. Buller-Peralta et al., Abnormal brain state distribution and network connectivity in a SYNGAP1 rat model. Brain Communications 2022.
5. Hristova et al., Absence seizures and sleep abnormalities in a rat model of GRIN2B neurodevelopmental disorder. BioRxivv 2024.

Automated data collection for deep behavioural phenotyping of rat models of neurodevelopmental disorder in a complex housing environment

Primary co-supervisors: Raven Hickson, Peter Kind

Assistant supervisors: Michael Gutmann, Gedi Luksys

Background: The richness and flexibility of the rat behavioural repertoire make them well suited as models of the cognitive and social aspects of neurodevelopmental disorders (NDDs). Standard laboratory housing drastically reduces opportunities for rats to express natural behaviours, therefore vastly diminishing the behavioural repertoire available to study1. The Habitat was designed to provide an environment that more closely aligns with the ecology of the Norway rat to address this mismatch and provide opportunities to observe the development of adaptive behaviours in their functional environment. Housing in the Habitat results in observable effects on the transcriptome, and living in the Habitat has different behavioural effects in two different models as compared with living in standard housing. Habitat housing also appears to alter behaviour at a micromovement scale, as captured by RatSeq (unpublished data, method described here2). However, little is known about what aspects of the Habitat experience may contribute to these effects on behaviour. Using multiple modes of data collected in the Habitat (RFID tracking, video, audio, etc.) to characterise behaviour, the overarching goal is to generate testable hypotheses about circuit-level differences between models of NDD and wild-types.

Aims: Develop a data analysis pipeline that allows for integration of multimodal (RFID tracking, video, audio, etc.) individual and group-level data collected from the Habitat.
Apply the latest AI and machine learning technologies to the problem of behaviour analysis on individual, dyadic and group-levels at both long and short time-scales.
Build behavioural models that allow for individuals to be clustered based on behavioural patterns in the Habitat and test whether these ‘behavioural profiles’ explain variability (inter- and/or intra-) in empirical behavioural tasks.

Rationale & hypothesis: The Habitat is a hypothesis generator. Through deep characterisation of individual behaviour patterns in a complex, dynamic environment, systematic differences between individuals and genotypes may emerge, generating testable hypotheses about the mechanisms that drive them. The next step in the Habitat project is to make the Habitat experience an observable variable and/or an experimental paradigm itself by collecting individual, longitudinal behavioural data under varying conditions and linking behaviour in the Habitat with behaviour in empirical tasks. The only feasible way to do this is by automating data collection in multiple modalities to piece together a picture of an individual’s experience and behaviour over time.

Training outcomes:
• Be able to critically examine and synthesize literature from multiple fields (ex. ecology, neuroscience, psychology, information theory, machine learning) to develop novel approaches to experimental design, data collection, and analysis.
• Be able to apply knowledge of machine learning and/or AI technologies to behavioural data collection, integration, and analysis in multiple modalities.
• Be able to communicate complex data effectively to colleagues and third-party stakeholders across a range of disciplines to facilitate collaboration.

References: Shemesh, Y., & Chen, A. (2023). A paradigm shift in translational psychiatry through rodent neuroethology. Molecular Psychiatry, 1–11. https://doi.org/10.1038/s41380-022- 01913-z
Wiltschko, A. B., Tsukahara, T., Zeine, A., Anyoha, R., Gillis, W. F., Markowitz, J. E., Peterson, R. E., Katon, J., Johnson, M. J., & Datta, S. R. (2020). Revealing the structure of pharmacobehavioral space through motion sequencing. Nature Neuroscience, 23(11), 1433–1443. https://doi.org/10.1038/s41593-020-00706-3

Using large scale neural network models to bridge the translational gap from animal models to human insights in autism spectrum and neurodevelopmental disorders

Primary supervisor: Matthew Nolan

Assistant supervisors: Matthias Hennig, Andrew Stanfield, Angus Chadwick

Background: While animal models have been successful at establishing cellular roles for genes strongly linked to autism spectrum disorders (ASDs) and neurodevelopmental disorders, there remains a substantial explanatory gap from this cellular level of analysis to the cognitive and perceptual differences associated with the human disorders.
Cognitive and perceptual tests are widely used in the diagnosis and clinical investigation of ASDs and neurodevelopmental disorders. Large neural network models can now perform comparably to humans in similar test domains. Moreover, neural network models can be implemented with architectures that map onto organisational features of mammalian nervous systems, while fine tuning of model hyperparameters is often critical for network performance. This raises the possibility that we can use manipulations of large neural network models to bridge the gap between molecular, cellular and circuit level deficits established in animal models on the one hand, and cognitive and perceptual deficits in patients on the other.

Aims: The overall goal of this project will be to test whether taking cellular and circuit level phenotypes established from animal models and imposing them on large-scale neural network models can account for and predict cognitive or perceptual deficits associated with ASDs and neurodevelopmental disorders.
A first aim will be to work with clinicians to establish a battery of tests that are used with patients, either for diagnosis or for experimental investigation, and then to implement these tests in a platform that can easily be used with state-of-the-art neural network models.
A second aim will be to identify model architectures and hyperparameters that balance performance on the tasks, mapping to relevant constraints of brain architecture, and computational cost, and then to test manipulations of the model architecture or hyperparameters that mimic phenotypes from animal models. Example manipulations include evaluation of local hyper- and distant hypo-connectivity, and differences in E-I balance (which could be implemented using models that incorporate Dale’s law in their connectivity rules).
A third aim will then be to use analytical methodologies from neuroscience and explainable-AI fields to develop explanations for how the low-level manipulations bring about high-level changes, and to work with clinicians to develop new strategies for reversing human phenotypes.

Rationale & hypothesis: The rationale is twofold. First, that the high-level phenotypes associated with ASDs and neurodevelopmental disorders result from differences in neural network computations that are in turn a result of low-level differences in cellular and circuit properties of neurons. Second, that large-scale neural network models solve many tasks using computations that are sufficiently similar to human brains that we can learn about both by investigating and manipulating the network models. This leads to the prediction that if we mimic in network models the structural and connectional deficits associated with disorders, then we will observe comparable high-level phenotypes.

Training outcomes:
• Expertise in clinical tests applied to ASDs and neurodevelopmental disorders.
• Expertise in training and optimisation of large-scale neural network models and their application to cognitive and perceptual tasks.
• Expertise in analysis methods that aim to explain how large-scale neural networks solve cognitive tasks.
• Expertise in development of novel clinical strategies for treatment of disorders.

References: This review article summarises previous work in this domain: Lanillos P, Oliva D, Philippsen A, Yamashita Y, Nagai Y, Cheng G. 2020. A review on neural network models of schizophrenia and autism spectrum disorder. Neural Networks 122:338–363. doi:10.1016/j.neunet.2019.10.014. The PhD project proposed here will differ in that the student will work closely with clinicians to adopt more relevant cognitive tests, the neural network models will differ in scale and architecture, the network constraints used to model disorders will be related more directly to biological observations, and direct tests of clinical predictions will be built into the project.

Using machine-learning approaches to predictively genotype ASD model rats based upon large-scale, high-density neuronal network data

Primary co-supervisors : Paul Rignanese, Peter Kind, Thomas Watson

Assistant supervisor: Arno Onken

Background: A major focus of the Kind lab is to understand the circuit-level mechanisms underlying neurodevelopmental disorders, including autism spectrum disorder (ASD). As a range of behavioural issues are associated with ASD, it is unlikely that they are attributed to isolated dysfunction within single brain regions. Rather, evidence from functional neuroimaging studies in people with ASD suggest that aberrant interactions exist across a large, distributed network of brain regions.  Therefore,we have performed multi-site recordings from a variety of brain regions, including cortex, midbrain, hippocampus and thalamus at both neuronal ensemble (multiple single unit) or population level (local field potential, electroencephalography) in a variety of species (mouse, rat, pig) during a variety of different behavioural states (e.g. complex cognitive tasks or natural sleep).  The result of this endeavour is the development of a large, unique repository of electrophysiological data from animal models of neurodevelopmental disorders.
One of the Kind Lab’s major objectives is to identify electrophysiological and behavioral markers within this dataset that may reliably predict the genotype of the animal models. Achieving this requires computational methods beyond traditional experimental neuroscience techniques. Consequently, we aim to employ advanced machine learning (ML) and artificial intelligence (AI) approaches to analyse this dataset, which could reveal potential biomarkers associated with specific genetic mutations and provide insight into how these markers might generalize across different animal models. This application of ML and AI not only enhances the understanding of genotype-phenotype relationships but also opens up possibilities for targeted diagnostics and therapeutic strategies in neurodevelopmental disorders, as biomarkers identified in animal models may translate into meaningful insights for human ASD diagnostics and interventions .
Methods will include (but not limited to) preparation of the dataset for AI analysis, carefully labeling and selecting relevant physiological  data (e.g., LFP, EEG, spikes), brain regions, experimental paradigms, and behavioral states, followed by preprocessing to incorporate multivariate features like coherence across neural networks. AI model training (e.g., XGBoost) to classify genotypes from electrophysiological features and testing classification accuracy through cross-validation. Finally, use feature importance analysis and SHAP (SHapley Additive exPlanations) to identify and interpret influential data features that guide the classifier’s decisions..

Aims: To address our hypothesis we propose 2 key aims:
• To use machine learning/ AI approaches to identify key features of electrophysiological data to identify genotypic differences in models of ASD
• To probe the generalisability of such key features across behavioural states, brain regions and genetic models.

Rationale & hypothesis: This proposal aims to identify neural circuit motifs underlying behavioural changes in models of ASD using advanced machine learning and AI methods. We will test the general, overarching hypothesis that electrophysiological measures from the brain can be used in a predictive manner to identify genotypes of models.

Training outcomes:
•Develop highly transferable skills in large dataset preparation and AI mining within the biomedical field.
• Interdisciplinary understanding of complex neurodevelopmental disorders and approaches to data analysis.
• We strongly support participation in training courses for public engagement, communication and writing skills, and statistics amongst others.

Neuronal encoding of contextual information in the visual cortex of mouse models of autistic spectrum disorders

Primary co-supervisors : Nathalie Rochefort, Arno Onken

Assistant supervisor: Andrew Stanfield

Background: Children on the autism spectrum (ASD) differ from typically developing children in many aspects of their processing of sensory stimuli. One proposed mechanism for these differences is an imbalance in higher-order feedback to primary sensory regions, leading to an increased focus on local object features rather than global context).
The aim of this project is to use mouse models to reveal the neuronal encoding of contextual information in the visual cortex. We will determine how visual feedback processing may be disrupted in mouse models of ASD.
This project will leverage artificial intelligence tools to uncover how natural stimuli statistics are encoded and how local neuronal populations form both reliable and context-dependent representations of natural scenes in the primary visual cortex. In order to investigate these mechanisms, we will use a combination of large-scale high throughput recordings of neuronal activity and artificial neuronal networks.
By using electrophysiological recordings with high density silicon probes(4), we will record neuronal responses to natural scenes in all layers of the primary visual cortex (V1), in awake head-fixed adult mice. We will use a recently developed modelling framework to generate optimized surround images and movies in order to systematically investigate the rules that determine contextual excitation versus inhibition in a naturalistic setting. Such closed-loop paradigm was developed through a collaboration between the two co-supervisors of this project: the team of Dr Arno Onken, School of Informatics(5) and Dr Nathalie Rochefort, CDBS. The approach is based on a new type of deep learning data-driven model that can accurately predict V1 responses to new (unseen) stimuli. The experimental design integrates large-scale neuronal recordings, this model capable of accurately predicting responses to diverse natural stimuli, in silico optimization of non-parametric images, and in vivo verification.

Aims: The proposal is organised around two main aims:
1.         Determine the spatio-temporal features of contextual modulation of visual responses in primary visual cortex
2.         Causally test the impact of feedback inputs from higher visual area (LM) on the contextual modulation of visual responses in primary visual cortex
We will systematically compare the results obtained in Syngap heterozygous mice and wild-type littermate controls.
Depending on the results, we will be able to use our artificial neural network model to generate a database of visual stimuli, aimed at specifically testing contextual perception in individuals affected by neurodevelopmental disorders. This will be done in collaboration with the team of Dr Andrew Stanfield.

Rationale & hypothesis: We hypothesize that sensory disruption in Autistic Spectrum Disorders results from an imbalance in higher-order feedback to primary sensory cortices, leading to an increased focus on local object features rather than global context.
We will test this hypothesis on mouse models, revealing the encoding of contextual information in the primary visual cortex and the role of feedback inputs from higher visual areas.

Training outcomes:
• In vivo recordings in awake behaving mice: training in neuropixels recordings; In vivo surgery, viral injections in mouse brain
• Computational methods: model-based analysis of the data, computational modeling of neural circuits; programming skills in Python.
• Data Management: managing and analyzing large datasets
• Research Ethics, animal research regulations
• Presentation of data, written and orally

References:
Emily J. Knight, Edward G. Freedman, Evan J. Myers, Alaina S. Berruti, Leona A. Oakes, Cody Zhewei Cao, Sophie Molholm, John J. Foxe, Severely Attenuated Visual Feedback Processing in Children on the Autism Spectrum, Journal of Neuroscience, 2023, 43 (13) 2424-2438
Smith D, Ropar D, Allen HA (2015) Visual integration in autism. Front Hum Neurosci 9:387.
Walker EY, Sinz FH, Cobos E, et al. Inception loops discover what excites neurons most using deep predictive models. Nat Neurosci. 2019;22(12):2060-2065.
Bimbard C, Takács F, Catarino JA, et al. An adaptable, reusable, and light implant for chronic Neuropixels probes. 2024, eLife. https://doi.org/10.7554/eLife.98522.1.
Li B, Cornacchia I, Rochefort N, Onken A. V1T: large-scale mouse V1 response prediction using a Vision Transformer. Transactions on Machine Learning Research. 2023. https://openreview.net/pdf?id=qHZs2p4ZD4
Pakan JM, Francioni V, Rochefort NL. Action and learning shape the activity of neuronal circuits in the visual cortex. Curr Opin Neurobiol. 2018;52(52):88-97.

Investigating cognitive flexibility and rule learning using an online videogame and computational modelling

Primary supervisor : Marino Pagan

Second supervisor: Angus Chadwick

Third supervisor: Andrew Stanfield

Background: Cognitive flexibility—the ability to adapt our thinking and actions according to changing contexts—is a core component of higher cognition. Deficits in cognitive flexibility are characteristic of fragile X syndrome (FXS) and other neurodevelopmental disorders, yet the specific alterations in cognitive processing that contribute to these deficits remain poorly understood. In recent years, sophisticated behavioural tasks have been developed to precisely characterize cognitive processing in rats (Pagan et al., 2022), but translating these findings to human populations remains a challenge.
This project seeks to bridge this gap by creating an online videogame specifically designed to measure cognitive flexibility and rule-learning in human participants with FXS. Inspired by decision-making tasks used in animal studies, the game will incorporate trial-and-error rule learning without explicit instructions, mimicking the task structure used with animal subjects. By requiring participants to adapt to progressively complex rules based on visual cues, the game allows for an engaging and remote method of data collection (Do et al., 2022; Chakravarty et al., 2024).
To analyse and model the collected data, we will leverage machine learning techniques including recurrent neural networks (RNNs), which can be trained to solve the same type of context-dependent decision-making task. RNNs have been shown to replicate behavioural traits and neural dynamics as rats (Pagan et al., 2022), and can be studied using sophisticated analytical methods (Pellegrino et al., 2023), thus representing a powerful model to study candidate learning mechanisms and their implementation within decision-making brain networks.

Aim 1: Design and Validate an Online Cognitive Flexibility Game for Human Subjects
We will develop a web-based cognitive game inspired by animal models, specifically tailored to assess rule learning and cognitive flexibility. The game presents participants with tasks requiring decision-making based on colour and spatial rules that evolve as the game progresses. This game design allows human participants to experience a rule-learning process similar to that of animals, but adapted for a digital, interactive environment accessible remotely.
Aim 2: Behavioural Data Analysis Using Advanced Computational Methods
We will analyse the collected behavioural data to shed light on the neural mechanisms underlying cognitive flexibility and decision-making.  Using a variety of machine learning algorithms, including classifiers, regression methods and hidden Markov models, we aim to uncover the specific cognitive strategies used by each individual subject, and to precisely characterize the learning impairments in FXS subjects in comparison to healthy control subjects.
Aim 3: Model Cognitive Flexibility with Recurrent Neural Networks (RNNs)
Using Recurrent Neural Networks (RNNs), we will simulate human learning and decision-making behaviours as observed in game performance. By training RNNs on similar tasks, we aim to replicate behavioural patterns, allowing us to investigate potential neural mechanisms underlying cognitive inflexibility in FXS. This computational modelling will enable nuanced cross-species comparisons, bridging the gap between findings in animal models and human cognitive processes.

Rationale & hypothesis: Cognitive deficits in FXS, particularly in adaptability and decision-making, may reflect similar underlying mechanisms across species. The use of FMR1-knockout rats has established a foundational understanding of these cognitive deficits, but direct human data is essential for translational insights. An interactive, web-based cognitive flexibility game provides an ideal tool to study these processes in human subjects and is accessible for remote deployment. We hypothesize that FXS-related cognitive inflexibility and decision-making impairments in humans will parallel patterns observed in animal models, particularly in task-switching and rule-learning. Moreover, computational modelling of human performance data will yield insights into the neural mechanisms that drive these behaviours and aid in cross-species comparisons.

Training outcomes:The PhD student will gain a broad skill set in computational modelling, behavioural data analysis, and cross-species cognitive research. Training will include 1) Behavioural Game Development: Design and validate online tasks that assess cognitive flexibility and decision-making in human subjects. 2) Advanced Data Analysis: Apply machine learning and data science methods to parse complex datasets and identify behavioural patterns. 3) Computational Modelling: Gain experience with RNNs and computational approaches to model and simulate human cognitive processes. 4) Cross-Disciplinary Research: Build expertise in linking findings across species to validate translational models, expanding relevance for therapeutic development. Writing and reviewing scientific manuscripts and delivering research presentations.

References:
Pagan, Marino, et al. “A new theoretical framework jointly explains behavioral and neural variability across subjects performing flexible decision-making.” bioRxiv (2022): 2022-11.
Do, Quan, et al. “Assessing evidence accumulation and rule learning in humans with an online game.” Journal of Neurophysiology 129.1 (2023): 131-143.
Chakravarty, Sucheta, et al. “A cross-species framework for investigating perceptual evidence accumulation.” bioRxiv (2024): 2024-04.
Pellegrino, Arthur, N. Alex Cayco Gajic, and Angus Chadwick. “Low tensor rank learning of neural dynamics.” Advances in Neural Information Processing Systems 36 (2023): 11674-11702.

AI-driven investigation of the neural circuit dynamics supporting online motor adaptation

Primary supervisor: Ian Duguid

Second supervisor: Angus Chadwick

Third supervisor: Maria Eckstein, Google DeepMind

Background: As we live in an ever-changing world, it is vital to be able to update our actions to achieve specific goals. This requires online correction and updating of sensory to action transformations mediated by distributed regions of the brain. Although we take for granted our ability to seamlessly adjust the steering wheel of a car to avoid collision with an object in the road, how the brain solves complex motor adaptation problems remains unresolved. Within the brain, interaction of the basal ganglia and cerebellum with the motor cortex is thought to allow for such motor adaptation. Specifically, it is thought that the cerebellum allows for adjusting movements when they do not go where expected, whereas the basal ganglia adjusts movements when their external targets change (1). However, how these strategies interact to support motor adaptation remains unclear. To address this, we aim to develop next-generation models of motor control and adaptation in the healthy brain, that can also be used to explore how circuit dysfunction leads to specific phenotypes in a range of neurodevelopmental disorders.
Recent technical developments in both neuroscience and artificial intelligence (AI) have given us unprecedented access to the brain-wide neural dynamics underpinning behaviour. By recording 1000s of neurons simultaneously across distributed brain regions we are now beginning to unravel the complex dynamics underlying motor control and adaptation. Moreover, comparing neural dynamics in the healthy brain with those observed in rodent models of neurodevelopmental disorders is providing valuable insights into how aberrant functional connectivity leads to debilitating behavioural deficits. Although previously impossible to interpret, using a combination of neural network and reinforcement learning based approaches we can now tease apart the generative processes underlying both neural activity (2) and behaviour
(3). Our AI-driven approach will harness the power of recent developments in AI to understand how neural activity distributed across the brain, with a specific focus on the basal ganglia and cerebellum, interacts to support motor adaptation.

Aims: This project will use a combination of neural networks and reinforcement learning models to dissect the contributions of individual and distributed brain regions in motor adaptation. To do so, we will analyse behavioural strategies and the corresponding underlying neural activity used by rodents to perform a variety of motor adaptation tasks. Our AI-driven approach will focus on overcoming the distinct limitations of neural network models, which lack interpretability, and reinforcement learning models, which lack the ability to capture complex neural dynamics. To do this, we will use neural networks with designs that limit the complexity of learnt representations (2,3), as well as hybrid models consisting of a combination of reinforcement learning and neural network components
(3,4). Finally, we will use this approach in combination with perturbations to the system, either artificially applied using optogenetic manipulation or present in neurodevelopmental disorders, to probe how neural dynamics distributed across the brain support behavioural strategies. Together, we aim to use recent advances in AI in combination with circuit-perturbations and neural recordings to understand how distributed neural activity supports motor adaptation.

Rationale & hypothesis: Previous work suggests that the cerebellum and basal ganglia independently drive updating of motor representations in response to different kinds of changes in the environment. In contrast, we will test the idea that coordinated input from both regions is necessary for online updating of motor representations and motor adaptation, and that task outcome and sensory prediction errors are both essential components of this process. We predict that gradual changes within neural representations in both the basal ganglia and cerebellum will cooperatively drive updates across the motor system to support motor adaptation.

Training outcomes: This project will allow you to work at the interface of behavioural neuroscience and AI. Within this, you will learn to use a mix of neural network and reinforcement learning models to interpret neural activity and behaviour. You will also learn to use these models to drive experimental design, particularly to determine which perturbations and additional behavioural tasks should be conducted to test predictions made by the models. You will also gain insight into how data is generated and the associated limitations with using electrophysiology and optogenetics in mouse models.

References:
1.           Arber S, Costa RM. Networking brainstem and basal ganglia circuits for movement. Nat Rev Neurosci. 2022;23(6).
2.           Pellegrino A, Cayco-Gajic A, Chadwick A. Low Tensor Rank Learning of Neural Dynamics. 37th Conference on Neural Information Processing Systems. 2023;
3.           Miller KJ, Eckstein M, Botvinick MM, Kurth-Nelson Z. Cognitive Model Discovery via Disentangled RNNs. 37th Conference on Neural Information Processing Systems. 2023;
4.            Eckstein MK, Summerfield C, Daw ND, Miller KJ. Hybrid Neural-Cognitive Models Reveal How Memory Shapes Human Reward Learning. PsyArXiv. 2024

Discovering a biomarker for tactile sensitivities in fragile x syndrome

Primary supervisor: Leena Williams

Second supervisor: Peggy Series

Third supervisor: Andrew Stanfield

External collaborator: Mark Tommerdhal, University of North Carolina (UNC) Chapel Hill, CEO Corticalmetrics, inventor
of the Brain Gauge tactile stimulator

Background: Tactile sensitivities are a prevalent understudied feature of Fragile X Syndrome (FXS). FXS is a leading inherited form of intellectual disability and tactile sensitivities are linked to aberrant eating, anxiety, and maladaptive behaviours. Current diagnostic strategies are highly variable carer questionnaires, therefore not fit for use, and treatments are non-existent. What are then the key objective bio markers for tactile sensitivities and what are the neuronal circuit impairments underpinning these features in sensory cortices? Presently, we are commencing a human study testing the use of EEG and a commercially available Brain Gauge tactile stimulator device to objectively quantify tactile impairments in FXS. In parallel our lab uses mouse models of FXS, FmR1-KO mice, to study tactile impairments and neuronal circuit dysfunction in the Somatosensory Cortex (S1). In short, we aim to cross correlate human and mouse model tactile data to advance our understanding with the longer-term vision of uncovering future therapeutic targets. This AI4Bi PhD project aims to accelerate transformation of our human and animal data into novel bio markers of tactile sensitivities by combining (Aim 1) the framework of theory-based computational psychiatry [4] (in particular Bayesian models [5]) to model our psychometric observations and (Aim 2&3) machine learning, trained on clinical and mouse data, to identify novel objective bio markers for tactile sensitivities. The novel bio marker for tactile sensitivities uncovered could be employed in the clinic, in future used in planned therapeutic research, and the findings maybe applicable to similar and associated conditions, increasing impact.

Aims: Aim 1: Using the framework of theory-based computational psychiatry to build mathematical and computer models to identify key diagnostic markers of tactile sensitivities in FXS.
Aim 2: Pre-process EEG and employ the brain activity along with tabular data (tasks and questionnaires) and results from behavioural model fitting to develop an explainable machine learning classifier.
Aim 3: Machine learning classifier refinement and the inputs needed to shorten acquisition

Rationale & hypothesis: Fragile X Syndrome (FXS) is a leading inherited form of intellectual disability defined by core features such as cognitive, motor, language, social and tactile impairments. Tactile sensitivities are a prevalent feature of other neurological disorders including other forms of intellectual disability and autism.
Somatosensory processing is an understudied therapeutic target [2,3]. The goal is to transform clinical and mouse model data into a novel biomarker for tactile sensitivities for use clinically and in future planned therapeutic research.
Perception is modulated by our expectations and prior beliefs about the world. It is increasingly evident that altered excitatory and inhibitory (E/I) homeostatic balance and decreased feedforward inhibition in the Somatosensory Cortex (S1) may underpin tactile sensitivities in FXS [6]. We have evidence in a FXS mouse model, FmR1-KO mice, that altered feedforward inhibition may impair the synaptic plasticity, or the changes in connections between neurons in S1, required to mitigate the interplay between perception, prior beliefs, and receptive fields [1,3]. We are currently running, in parallel, a clinical study to address this question by objectively quantifying somatosensory function in FXS using electroencephalography (EEG) and the Brain Gauge tactile stimulator with seed funding from SIDB and the FRAXA research foundation. This AI4Bi PhD project aims to accelerate transformation of our human and animal data into novel bio markers for tactile sensitivities by (Aim 1) using the framework of computational neuroscience, computational psychiatry and (Aim 2&3) machine learning, trained on clinical and mouse data. This project is a collaboration between clinicians, basic neuroscientists, computational psychiatry, and industry partners.

Training outcomes: Strong experience in both theory-driven and data-based computational psychiatry for building mathematical and computer models to address the outlined aims. Experience designing and analysing data from psychophysical, imaging, and electrophysiological experiments in humans and mice. Expertise in neuroscience, sensory processing, neuronal circuits in sensory cortices, and perceptual learning.

References: L.E. Williams, L. Küffer, T. Bawa, E. Husi, S. Pagès, A. Holtmaat, Repetitive sensory stimulation potentiates and recruits sensory-evoked cortical population activity. Journal of Neuroscience in print. doi: https://doi.org/10.1101/2024.08.06.605968
N. A. J. Puts, E. L. Wodka, M. Tommerdahl, S. H. Mostofsky, and R. A. E. Edden, ‘Impaired tactile processing in children with autism spectrum disorder’, Journal of Neurophysiology, vol. 111, no.9, pp. 1803–1811, May 2014, doi: 10.1152/jn.00890.2013.
P. Karvelis, A. Seitz, S. Lawrie and P. Series (2018). Autistic traits, but not schizotypy, predict overweighting of sensory information in Bayesian visual integration, eLife, 7:e34115.
Computational Psychiatry: a primer,P Seriès (editor), MIT press (2020)
Angeletos Chrysaitis N, Seriès P. 10 years of Bayesian theories of autism: A comprehensive review. Neurosci Biobehav Rev. 2023 Feb;145:105022. doi: 10.1016/j.neubiorev.2022.105022. Epub 2022 Dec 26. PMID: 36581168.
Domanski, Aleksander P F et al. “Cellular and synaptic phenotypes lead to disrupted information processing in Fmr1-KO mouse layer 4 barrel cortex.”