CRDC is a cloud-based data science infrastructure for sharing, integrating, and analyzing data from cancer research. It enables NCI-funded programs to publicly share their data in genomics, proteomics, imaging, and other data types or modalities. The Cancer Research Data Commons website has more about the CRDC and its initiatives.
A key service within the CRDC is the Data Standards Services (DSS), which facilitates the aggregation and analysis of data from diverse repositories. Samvit has led the DSS effort by working closely with the NCI Semantic Infrastructure teams of caDSR and EVS. To harmonize data elements across repositories, we extensively analyzed each CRDC node’s data dictionaries, models, metadata elements, and supporting terminologies.
Our team developed a comprehensive process to ensure data harmonization is done accurately and methodically:
Samvit leveraged our deep knowledge of biomedical research data standards, the NCI caDSR, and NCI EVS, as well as our analysis and research skills, to successfully accomplish this harmonization effort. This effort involved close collaboration with representatives across the CRDC and NCI to ensure that the common data from disparate sources could be effectively combined and analyzed. This also allowed Samvit to gain critical insight into the CRDC node dictionaries and models. The Samvit team led and supported the development of over 70 common CRDC CDEs across the various CRDC nodes.