[ad_1]
Abstract
The UK’s health datasets are among the most comprehensive and inclusive globally, enabling groundbreaking research during the COVID-19 pandemic. However, restrictions on data sharing between secure data environments (SDEs) imposed limitations on the ability to carry out joint analyses across multiple separate datasets. There are currently significant efforts underway to enable such analyses using methods such as federated analytics (FA) and virtual SDEs. FA involves distributed data analysis without sharing raw data but does require sharing summary statistics. Virtual SDEs in principle allow researchers to access data across multiple SDEs, but in practice, data transfers may be restricted by information governance concerns.
Secure multiparty computation (SMPC) is a cryptographic approach that allows multiple parties to perform joint analyses over private datasets with zero information sharing. SMPC may eliminate the need for data-sharing agreements and statistical disclosure control, offering a compelling alternative to FA and virtual SDEs. SMPC comes with a higher computational burden than traditional pooled analysis. However, efficient implementations of SMPC can enable a wide range of practical, secure analyses to be carried out.
This perspective reviews the strengths and limitations of FA, virtual SDEs and SMPC as approaches to joint analyses across SDEs. We argue that while efforts to implement FA and virtual SDEs are ongoing in the UK, SMPC remains underexplored. Given its unique advantages, we propose that SMPC deserves greater attention as a transformative solution for enabling secure, cross-SDE analyses of private health data.
Introduction
The UK has some of the most rich, inclusive and large-scale health datasets in the world. Spurred on by the need to answer urgent questions in relation to the COVID-19 pandemic, these data assets were used to undertake UK-wide analyses on an unprecedented scale.1–3 This culminated in the first-ever cohort study using routinely collected electronic health records of the entire UK population aged >5 years, in which the association between COVID-19 under-vaccination and severe COVID-19 outcomes (hospitalisation or death) was investigated.4 Although these analyses have yielded important answers that have shaped scientific responses to the pandemic, restrictions on data sharing between secure data environments (SDEs) have resulted in limitations to the cross-SDE analyses that researchers have been able to undertake. For example, pooled analysis sharing minimal amounts of non-disclosive, aggregated information1 2 or parallel analyses across SDEs followed by meta-analysis.3 4
In this perspective, we review the advantages and disadvantages of methods for enabling joint analyses across multiple private datasets and suggest secure multiparty computation (SMPC) as a promising new approach that can achieve this goal with no information sharing.
SMPC offers a potential way forward
SMPC allows several parties to carry out a joint computation over private datasets with zero information sharing. There are many techniques that can be used for SMPC. For example, secret sharing allows a private value to be distributed across multiple parties, where no party on its own has information about the secret, but some threshold number of parties can together recover the secret. Secret sharing has the advantage of information-theoretic or perfect security, meaning the protocol is secure against adversaries that have an infinite amount of computational power. A separate class of techniques called garbled circuits uses encryption to carry out calculations securely. For introductions to SMPC, see Escudero (2024) and Evan et al (2018).9 10
SMPC has the major advantage that no information is shared. In particular, there are mathematically rigorous proofs that SMPC protocols are secure and do not leak any information. This eliminates the need to determine whether information could be disclosed and obviates the need for data-sharing agreements or SDC between SDEs. On the other hand, it can come with significant additional computational complexity. In particular, a large volume of non-disclosive communication can be required between parties. In secret sharing, this involves the parties sending random numbers to each other that can be combined to recover the secret. In garbled circuits, the parties send encrypted messages and keys to each other. However, the additional computational demand of SMPC may not present a significant barrier in a wide range of epidemiological studies. Like FA, SMPC does not allow parties to access each other’s data, and so this approach shares similar problems with data harmonisation and cleaning.
Conclusion
FA, virtual SDEs and SMPC all offer ways of securely carrying out joint analyses across SDEs, each with different advantages and disadvantages. Virtual SDEs can in principle allow all data to be shared and pooled. However, data must ultimately travel between SDEs, which poses security risks that may lead to restrictions on what data can be shared. FA only shares summary statistics between SDEs. SMPC shares no information, providing provable security. However, these latter two approaches do not allow parties to see each other’s data in a way that is conducive to data harmonisation and cleaning.
While there are significant efforts underway to implement FA and virtual SDEs in the UK, SMPC has received relatively little attention. One reason for this is that SMPC may not be as well known and uses cryptographic techniques that are not widely understood. However, SMPC offers similar capabilities to FA but with the additional security guarantee that the parties do not learn anything about each other’s data beyond the final result of the calculation. SMPC is also now sufficiently developed that it is practical to implement in health data analyses. For these reasons, we believe SMPC is promising and warrants greater attention as a solution for enabling pooled analysis across private health datasets.
[ad_2]
Source link



