As you say, to progress towards the goal of true precision medicine, the cancer research community will need to access, integrate, and analyze many different types of data, and to be successful those data must be portable and easily shared among providers, researchers, patients and research participants. Facilitating this kind of data integration and access requires significant planning and investment in the underlying technology and informatics. Traditionally all these different data types are stored in separate databases without much consideration for how they might be shared.
This creates several challenges, which NCI and the broad cancer community are currently working to overcome.
One of the big challenges is quality and consistency of data, which requires harmonization and the application of standard metadata. Without this, it becomes much more difficult to share data, and even within repositories containing only one major data type, the value and usability of the data is diminished significantly. Active curation of the data as it is submitted to a repository or data commons is a necessary step to making the data usable and reliable.
A related issue is access to data across different domains – for example, genomic data and associated patient clinical data – which need to be queried and analyzed together to be truly useful. Efforts to create standard patient identifiers that allow for search and analysis while protecting patient privacy are critical, as are standardized metadata and APIs that facilitate the search across repositories. Additionally, patient consent needs to be much broader - most patients want to share their information to advance research and help other patients, yet consents still tend to be quite limited, restricted only to the study in which the patient is participating. Broader consent and support for data curation will also make it easier for researchers to contribute their data to open repositories, removing barriers that currently exist in data sharing.
Finally, the size of the data and the compute power required for analysis present additional challenges. Storing genomic and pathology imaging data, for example, requires extremely large databases, and the data are difficult and time-consuming to download. Many smaller institutions simply don’t have the servers to store or compute on such data. Investment in innovative infrastructures that support researcher and clinician access to big data is absolutely critical to progress towards the vision of precision medicine.