HARMONIZED DATASET ON THE CGC
Datasets on the CGC are now categorized as "harmonized" and "legacy" in accordance with the GDC.
In 2016, the GDC started hosting and distributing previously generated data from The Cancer Genome Atlas (TCGA). Additionally, for all submitted sequence data (FASTQs and BAM alignment files), the GDC generated new alignments (BAM files) to the latest human reference genome, GRCh38, using standard workflows. Using these alignments, the GDC generated derived data, including normal and tumor variant and mutation calls, gene and miRNA expression, and splice junction quantification data. The GDC refers to this process of data generation through standard workflows as data harmonization.