Integrating, Storing and Analyzing Data in Healthcare

Executive Summary

It is crucial to develop informatics tools to integrate, analyse and extract value from databases  with specific attention on the interoperability of the respective databases. This should include development to ensure the quality, completeness and validity of data. Efforts on data harmonisation will be needed to ensure that datasets with important information for precision medicine approa­ches, can be utilised and aligned. As a first step this would need the development and definition of minimal datasets for clinical research databases.

The Global Partnership for Sustainable Development Data provides a useful definition of interoperability:

“Interoperability is the ability to access and process data from multiple sources without losing meaning and then integrate that data for mapping, visualization, and other forms of representation and analysis. Interoperability enables people to find, explore, and understand the structure and content of data sets. In essence, it is the ability to ‘join-up’ data from different sources to help create more holistic and contextual information for simpler, and sometimes automated analysis, better decision-making, and accountability purposes.”

It will be critical to provide for the conversion of free-text docu­ments, such as physician notes and radiology reports, to formatted data. Accessing sources of genomic data and establishing standard techniques for the integration of clinical and genomic data will also be important.  Finally, enabling the collection, linkage and integration of wellness data will also be valuable.

Storing Data

Efforts will be required to convince all involved parties to share existing datasets.  This will involve establishing controls to avoid the risks and disadvantages associated with setting up such open collaborations. For data sharing, common rules need to be developed and established, taking into consideration: the maintenance of data; ensuring sample sizes which are statistically relevant; and, balancing open access with the need to protect data privacy, innovation and intel­lectual property.  Using a federated data model, that maintains the research databases within institutional firewalls, while restricting access to aggregated, de-identified results, will be critical to maintaining support for open collaboration.

Analyzing Data

Once data is established in well-organized federated databases, analytical tools and techniques can be applied by researchers.  This will include longitudinal cohort studies as well as machine learning applied to image data. Cohort studies with long-term follow-up will allow subgroup analyses of outcomes depending on different therapies for biomarker-positive and biomarker-negative patients. This will lead to a stratified risk-benefit analysis for therapy in biomarker positive and negative patients. Finally, such studies will provide common knowledge on clinical utility (efficacy, safety) of Precision Medicine and will support regulatory decision-making processes.

Using libraries of image and text models, preprocessing techniques, database infrastructure, and large-scale computing resources, researchers will be well-positioned to quickly iterate on past results as new data becomes available. This will enable predictive analytics for health outcomes as well as correlations of genomics, clinical and imaging data. 

Additional Information on Data Privacy: 

There are numerous laws in Canada that relate to privacy. There are also various organizations and agencies responsible for overseeing compliance with these laws both federally and provincially.