Access to multi-’omics data on the CGC

The CGC is part of the NCI Cancer Research Data Commons (CRDC) data ecosystem. The CRDC is a cloud-based data science infrastructure that provides secure access to a large, comprehensive, and expanding collection of cancer research data. Users can explore and use analytical and visualization tools for data analysis in the cloud. The CGC is one of three cloud resources that enable this capability.

 

Image from https://datacommons.cancer.gov/

 

CRDC Datasets Available

The CGC provides access to more than 3 Petabytes of publicly available data as part of the CRDC ecosystem, including open access and controlled access data. The CGC connects to Genomics Data Commons (GDC), Proteomics Data Commons (PDC), Integrated Canine Data Commons (ICDC), and Cancer Data Service (CDS) to provide access to these data. CGC provides access to Controlled and Open Access data to thousands of files and dozens of disease types, as delineated below:

Open Data 

The Open Access tier includes information which is not unique to an individual.  This includes information such as:

  • De-identified clinical and demographic data

  • Gene expression data

  • Copy number alterations in regions of the genome

  • Epigenetic data

  • Summaries of data across individuals

Controlled Data 

The Controlled Access tier includes information which is unique to an individual.  This includes most raw data files, and some processed data such as:

  • Primary sequencing data (BAM and FASTQ files) from DNA, RNA, miRNA or bisulfite sequencing studies

  • Raw and processed SNP6 array data

  • Raw and processed Exon array data

  • Somatic and germline mutation calls for an individual (VCF and MAF files)

Checking your access status

In order to access Controlled Data in the CGC, you must be listed in dbGaP as either an approved PI or authorized 'downloader' for controlled access data. Being listed as a collaborator on an Access Request does not automatically result in your inclusion in the authorized list. You can check your current status by following the steps below. 

 

 

1. Log in to the dbGaP Authorized Access system.

  • Your browser will redirect to the NIH Secure identity provider ‘iTrust’. 

  • If you are an extramural (non-NIH) researcher, use your eRA Commons ID and Password.  

    • Note, if your eRA Commons password has expired, you will receive an error message ‘Authentication failed’. If this occurs, go to eRA Commons and set a new password. Make sure you can successfully login to eRA Commons with your username and password before proceeding

  • If you are an intramural (NIH) researcher, use your NIH CIT login credentials (the same login credential as your NIH email account).


 

2. Verify that ‘TCGA - The Cancer Genome Atlas (phs000178.v9.p8), for example, is listed with a status of ‘GRANTED’.

  • You can find all of your current requests under the 'my requests tab'

  • If you are a PI, you will see the title and project number of your access request. You also have the option to revise your project from this page. Also note that the expiration date of your data access is shown.

  • If you are an approved downloader, the PI and project your approval is associated with will be displayed.