GETTING STARTED ON THE CGC


QUICK START GUIDES AND TUTORIALS

Begin your journey with our convenient Tutorials and detailed Knowledge Center for more information about how the CGC can help you learn from large datasets faster.  

Is there a workflow missing that you’d like to see? Contact us to request a new one or assistance!

ANALYZE PETABYTES OF PUBLIC CANCER DATA IMMEDIATELY AND SECURELY

ALL YOU NEED IS AN INTERNET CONNECTION!

With the ease and lower cost of sequencing technologies, there has been an explosion of ‘omics data produced. This has resulted in a cumulative number of genomes, exomes, transcriptomes, etc. In combination with proteomics and images, these datasets require an enormous amount of storage facilities to house, and high performance computation capacity to process and analyze it. Prior to the launch of the CGC, in order for researchers to compute over a large dataset, or analyze their own data alongside it, they had download the dataset to their own hardware or high performance computing cluster. 

The CGC allows researchers to immediately and securely access public data on the cloud, including raw and processed data from whole genome, whole exome, RNA, microRNA, bisulfite sequencing, proteomics and imaging studies. Both Open Access and Controlled Access data is available. 

Data Browser example using TCGA data.

The Data Browser feature on the CGC allows researchers to quickly and easily search across more than 100 different properties to find exactly the data they are interested in. Researchers using the CGC can search for cases and data by their associated clinical metadata, and use a visual case explorer to browse the mutation status and expression levels of a gene in all patients with a particular disease. They can then recall all files associated with these patients, filter further by metadata and execute any analyses over them.

The CGC democratizes cancer research. Scientists anywhere with an internet connection can manipulate and compute on large cancer datasets to further their research. There is no need to provision, set up and maintain servers for storage and computation, and no time or bandwidth is spent waiting for data to download

$300 in CREDITS TO SUPPORT YOUR RESEARCH

 An important goal of the CGC is to understand how researchers can use cloud computing resources to analyze their data. Unlike traditional models, where the cost of compute and data storage is paid up front, on the CGC you only incur costs as you run an analysis.

However, we've found that it can be daunting to try to learn a system while worrying about analysis costs. For this reason, the NCI has generously provided substantial funds to support your compute and storage on the CGC as you are getting familiar with the platform. When you create an account on the CGC, you'll be automatically granted $300 in credits.

Whether you are a new CGC user or an experienced one, the only costs you will incur when using the CGC are those for computation and for storage of any files you upload or output files you generate. All costs on the CGC are directly based on Amazon Web Services (AWS) on-demand or spot instance pricing, and all storage costs are based on AWS S3 data storage pricing. If you would like help with estimating the cost of an upcoming project, please feel free to email us at support@sbgenomics.com.

Have a big project in mind? Submit a collaborative project request to access up to $10,000 in credits.

Of course, once you've used the credits, you can contact us to create a billing group that can be supported via a credit card or purchase order.