EUDAT Pilot Project
Clinical trials involves human volunteers and is the most effective way to add knowledge to the growth of the medical domain. According to a study protocol, participants receive specific interventions that may be medical products, such as drugs or devices. Everything, from the conduct of the trial to the details of data management and data analysis is described in the study protocol and this guarantees a high quality of data. ECRIN (The European Clinical Research Infrastructures Network) is a distributed research infrastructure established to support high quality clinical research, and in particular international European clinical trials. ECRIN is a federation of national networks of clinical trials units in Europe and a legal entity (ECRIN-ERIC), with a core team operating from Paris, which is managing the federation and which is funded by contributions from European governments. ECRIN offers support to multinational clinical research projects through information and consultancy, and by providing a variety of support services. The Long Term Data Preservation of clinical trial datasets and the other accompanying documents (e.g. data management plan, statistical analysis plan) is a requirement of Good Clinical Practice. In this context, ECRIN wants to improve clinical trials transparency and store and give access to individual-level clinical trials data after the end of a trial. Clinical trials data is sensitive data that is subject to special legal protection. For this purpose, data needs to be anonymised and access must be controlled and restricted. To evaluate the ability of EUDAT services to provide long term data preservation and re-use capabilities for clinical trials datasets, a EUDAT pilot for ECRIN was initiated using B2SAFE as central archiving facility with infrastructure support provided by EUDAT.
The aim of our pilot is to evaluate EUDAT services for the archiving and sharing of data from clinical trials. In contrast to other pilots, which are using B2SAFE as a simple backup and replication solution for an already existing community repository (e.g. DSPACE, Fedora), we use B2SAFE for long-term storage by directly importing data into B2SAFE. Because access to the data is only granted to the community data manager, the requirement for restricted access is ensured. Clinical study data are stored as pseudonymised, coded data sets with additional documents that contain metadata and explanations necessary for analysis of the data. They have to be stored and processed under consideration of data protection rules (restricted access), to prevent any re-identification of study subjects.
EUDAT B2SAFE pilot for the archiving of ECRIN clinical trials study data
B2SAFE is a service which allows repositories to implement data management policies on their data across multiple administrative domains. It is focused on the long-term storage of data consisting of an abstraction layer of large scale, heterogeneous data storages, and protection mechanisms against data loss during archiving. In general, B2SAFE is implemented by repositories, which do not have enough own capacity or funding to provide reliable storage and access services for long-term data storage or repositories without adequate computing capacities, especially for data-intensive services. Other users are such data producers who rely on trusted centres taking care of their data.
Besides storing data in a safe manner, B2SAFE enables the execution of data policy rules, the usage of persistent identifiers (PIDs), observance of data owners’ rights to define access rights as well as rights to define how and when data can be made public. B2SAFE is managed centrally by a data manager employing locally implemented data management policies. The B2SAFE module consists of a set of iRODS rules, which can be combined to complete workflows enabling data replication and PID management. The EUDAT B2SAFE service offers the functionality to replicate datasets across different data centres in a safe and efficient way while maintaining all information required for finding and querying information about the replica locations.
EUDAT B2SHARE mdel for the sharing of ECRIN clinical trials study data
When we are employing B2SAFE a problem becomes apparent, the ingest of clinical trials data as well as the synchronisation between B2SAFE servers is done unprotected that means unencrypted. Clinical trials data is health data that is sensitive data that is subject to special protection. All persons involved in clinical trials and especially the activities of physicians are highly regulated, which increases the complexity od associated data processes from a data protection point of view. Therefore, an encryption step would be necessary to protect clinical trials data in flight and at rest. On the other hand, metadata should be used to make the search for data more effective. This leads us to the second part of the pilot, the sharing of stored clinical trials data.
We have developed a model of the pilot that considers both, the security of data transfer and the provision of access to data. In this model the study datasets are encrypted locally and shifted to the HHU Community Server, from where data is transferred to B2SAFE. In this way B2SAFE stores encrypted data and all processes on flight, like synchronisation and replication are done with encrypted data. To make the study known to the research community metadata about the study which is archived should be published. Researchers intended to reuse the stored data may contact the data owner to get approval for the reuse and access to the stored study data. Because in our model the metadata of the stored files is published in B2SHARE, the files become findable. The metadata is collected with B2SHARE based on an ECRIN community metadata schema and harvested by B2FIND. In addition the PID of the file stored in B2SAFE is published in B2SHARE. In this way, the existence of a file can be indicated, properties including the access rights are published, though the access to this file is not open but depends of the restrictions demanded by the data owner. In addition, anonymised data sets can be provided to researches through publication in B2SHARE.
B2SHARE allows the sharing of open access data, but it provides no functionality for "restricted access data". Such additional functionality would provide the pilot with the necessary functions to handle sensitive data. The EUDAT Goup for sensitive data addresses the topic of adapting EUDAT services to enable them to deal with sensitive data.
The whole picture
B2SHARE is a repository for shareable digital objects. In our final model we use B2SHARE to distribute some anonymised datasets and publications that are open, or that are linked to these documents. The pseudonymised datasets and the study documents are archived under restricted access conditions in B2SAFE or in restricted access folders in B2SHARE, whereas a subset of the data is anonymised and shifted into B2SHARE for publication and sharing. In addition, B2FIND is integrated, a discovery service based on metadata that is harvested from research data collections from EUDAT data centres and repositories. Therefore, this service allows to discover data that is stored through the B2SAFE and B2SHARE services. B2FIND allows users to find collections of scientific data, get quick overviews of available data, and browse through collections using standardized facets. For the B2SHARE service no installation at our location was necessary; we are using the general EUDAT service. For the use of data archived in B2SAFE for analysis, the encrypted data may be transferred from B2SAFE or B2SHARE to the community server and there analysed in a protected environment after being decrypted. The results of the analysis may be published through B2SHARE. This workflow has still to be implemented at the community location and to be tested by us.