Accessing TCGA data 

While all data in TCGA is stripped of direct identifiers, genomic information is inherently unique to an individual. Two types of data access ‘tiers’ have been put in place to balance the desire to make the data as widely available as possible while ensuring that the rights of study participants are well protected.

Any researcher can access and use data within the Open Access tier (Open Data) as long as they agree to the data use restrictions and requirements outlined in the TCGA publication guidelines. Researchers requiring data from the Controlled Access tier (Controlled Data) for their studies need to obtain an approved Data Access Request through the Database of Genotypes and Phenotypes (dbGaP) and agree with all TCGA Data Use Certifications as well as the TCGA publication policy.

Examples of data types available under each permission group is summarized below. More information about access tiers and the types of data available at each tier can be found on the TCGA website.

Open Data 

The Open Access tier includes information which is not unique to an individual.  This includes information such as:

  • De-identified clinical and demographic data

  • Gene expression data

  • Copy number alterations in regions of the genome

  • Epigenetic data

  • Summaries of data across individuals

Controlled Data 

The Controlled Access tier includes information which is unique to an individual.  This includes most raw data files, and some processed data such as:

  • Primary sequencing data (BAM and FASTQ files) from DNA, RNA, miRNA or bisulfite sequencing studies

  • Raw and processed SNP6 array data

  • Raw and processed Exon array data

  • Somatic and germline mutation calls for an individual (VCF and MAF files)

 

Checking your access status

In order to access Controlled Data in the CGC, you must be listed in dbGaP as either an approved PI or authorized 'downloader' for TCGA. Being listed as a collaborator on an Access Request does not automatically result in your inclusion in the authorized list. You can check your current status by following the steps below. 

 

 

1. Log in to the dbGaP Authorized Access system.

  • Your browser will redirect to the NIH Secure identity provider ‘iTrust’. 

  • If you are an extramural (non-NIH) researcher, use your eRA Commons ID and Password.  

    • Note, if your eRA Commons password has expired, you will receive an error message ‘Authentication failed’.  If this occurs, go to eRA Commons and set a new password. Make sure you can successfully login to eRA Commons with your username and password before proceeding
  • If you are an intramural (NIH) researcher, use your NIH CIT login credentials (the same login credential as your NIH email account).

 

2. Verify that ‘TCGA - The Cancer Genome Atlas (phs000178.v9.p8) is listed with a status of ‘GRANTED’.

  • You can find all of your current requests under the 'my requests tab' 
  • If you are a PI, you will see the title and project number of your access request. You also have the option to revise your project from this page. Also note that the expiration date of your data access is shown.

  • If you are an approved downloader, the PI and project your approval is associated with will be displayed.