CUH Logo

Mobile menu open

Safely linking big datasets to help find medical answers

Professor Serena Nik-Zainal is leading a Cambridge trial to find ways to analyse data across more than one secure research hub to help speed up advances in science and knowledge.

The aim of the study is to link large research data hubs without any of the original data being moved between sites. This form of data sharing, where trusted research environments are able to securely ‘talk’ to each other, is known as a federation or federated learning.

Professor Nik-Zainal is an Honorary Consultant in Clinical Genetics at Cambridge University Hospitals (CUH).

The trial has received funding of £200K from UK Research and Innovation as part of the DARE UK (opens in a new tab) (Data and Analytics Research Environments UK) programme.

Professor Serena Nik-Zainal explains why this study is a first for the UK

Link: https://youtu.be/CCDJRCckBCs

Video transcript

00:00:03:10 - 00:00:06:23

So today, the ability to generate data, whether that's hospital

00:00:06:23 - 00:00:11:09

data or scientific data or genetic data, is all there.

00:00:11:10 - 00:00:13:20

There's large amounts of data available

00:00:13:20 - 00:00:15:21

for doctors and scientists to do their work.

00:00:16:20 - 00:00:18:24

But there are concerns about

00:00:19:01 - 00:00:23:24

how we share data across multiple different sites, how we can accelerate

00:00:24:20 - 00:00:26:00

how we do research,

00:00:26:00 - 00:00:29:13

because when you have large data sets, it's quite cumbersome to do research.

00:00:29:24 - 00:00:33:00

So in this project, our aim is to build a link,

00:00:33:01 - 00:00:36:17

a computer link between Cambridge and the national genomics

00:00:36:17 - 00:00:40:18

project called Genomics England, which will do a couple of things.

00:00:40:18 - 00:00:45:06

First, it'll help to reduce the risk to data privacy,

00:00:45:06 - 00:00:48:23

data doesn't have to move, data stays exactly where it was created

00:00:49:05 - 00:00:53:23

and the scientists can perform their research situated wherever they are.

00:00:54:13 - 00:00:55:23

And the second thing is,

00:00:55:23 - 00:00:59:09

by being able to combine these data sets through these links,

00:00:59:17 - 00:01:04:24

we will enable, we will facilitate and accelerate the speed of discovery.

00:01:05:07 - 00:01:09:15

The more data you have, the easier it is to make new discoveries,

00:01:09:22 - 00:01:14:21

and this is what this project aims to do.

This will hopefully unlock unprecedented possibilities for collaborations across the UK and lead to new discoveries with long term public benefit

Professor Serena Nik-Zainal

Trusted research environments are secure spaces for researchers to access and analyse sensitive data.

They also help prevent unauthorised access and re-identification of individuals from anonymised data.

However, the ability for researchers to analyse data between two such environments is not currently possible which can delay new discoveries.

This project will bridge the gap between Genomics England and the NIHR Cambridge Biomedical Research Centre data.

Both contain rich, secure, governed sources of fully consented clinical genomic data from patients.

After looking within the data from both research environments to find individuals with certain characteristics, a joint analysis will be run within both environments, and the results combined in a separate secure cloud environment.

This means that no original data will move, only the results.

We want to reduce the risk to data privacy so data doesn't have to move and stays exactly where it was created

Professor Serena Nik-Zainal

Professor Serena Nik-Zainal said:

"There's large amounts of data available for doctors and scientists to do their work but there are concerns about how we share data across multiple different sites.

"This project aims to help reduce the risk to data privacy so data doesn't have to move and stays exactly where it was created and scientists can perform their research situated wherever they are.

"It also aims to highlight the advantages of being able to work with data from more than one research environment and set new standards in the field of data sharing.

"This will hopefully unlock unprecedented possibilities for collaborations across the UK and lead to new discoveries with long term public benefit.”

The work has been approved as a Sprint Exemplar Project as part of Phase 1 of the DARE UK programme, which is delivered in partnership Health Data Research UK (HDR UK) (opens in a new tab) and Administrative Data Research UK (ADR UK) (opens in a new tab).

Related news & stories