Click the banner to go to the official HTAN Jamboree page for registration links and more

Click the banner to go to the official HTAN Jamboree page for registration links and more


To prepare you for this data jamboree we’ve assembled the following resources:

Data Access

The HTAN Portal leverages several repositories to provide access to data:

Instructions for accessing data from these repositories can be found directly from the Explore Page by making a selection of files and clicking on the download button. More information about accessing data can be found here.

  • Instructions for accessing controlled data via dbGaP

    Access to controlled-access (i.e. protected) data is granted on a per project basis via the database of Genotypes and Phenotypes (dbGaP). This primarily includes raw sequencing data such as BAM or FASTQ files as well as VCF files and protected MAF files. To gain access to these files a user must apply for access via dbGaP to individual projects. Each project has a Data Access Committee (DAC) that will approve or disapprove data access requests. Before gaining access through dbGaP users also need to obtain an eRA Commons ID for authentication purposes.

  • Cloud Cost Estimation

    Learning to estimate and manage your cloud costs will prepare you to effectively budget for your research projects. These estimates can be included in grant proposals, or be used to request cloud credits offered by the National Institutes of Health.


On the CGC

  • Public Project: HTAN

    HTAN is a National Cancer Institute (NCI)-funded Cancer MoonshotSM initiative to construct 3-dimensional atlases of the dynamic cellular, morphological, and molecular features of human cancers as they evolve from precancerous lesions to advanced disease. (Cell April 2020)


  • Public Project: MCMICRO

  • Tutorial: MCMICRO on the CGC

    MCMICRO is an end-to-end processing pipeline that transforms multi-channel whole-slide images into single-cell data. MCMICRO is an open source, community supported software that uses Docker and workflow software to create pipelines for analyzing microscopy-based images of tissues.


  • CGC Onboarding videos

    This series of videos will teach users the basics for using the Seven Bridges Cancer Genomics Cloud (CGC), powered by Velsera. The CGC is part of NCI’s Cancer Research Data Commons, a cloud-based data science infrastructure that connects data sets with analytics tools to allow researchers to share, integrate, analyze, and visualize cancer research data to drive scientific discovery.