Practical exercises

The cancer genomics cloud online course

Comprehensive bioinformatic analysis
of cancer genomes


BASICS ON NEXT GENERATION SEQUENCING (NGS) APPLICATIONS AND THE CANCER GENOMICS CLOUD (CGC)

Estimated Time: 1 week
Learning Goals: To familiarize the user with the CGC platform and TCGA data

Specific goals

- Register as a CGC user
- Create a project in the CGC platform
- Have an overview of TCGA data using the Data Overview
- Explore TCGA dataset using the Case Explorer
- Browse available data using the Data Browser
- Differentiate between open and controlled data
- Import data to the project

exercise activities

- Create an account on the CGC Platform
- Create a project
- Data Overview:

  • Select PRAD (Prostate Adenocarcinoma
    • Number of cases
    • How many 'gender', 'race', 'age' sample type
    • File distribution: how many RNA-Seq, WXS and WGS files?

- Exploration of TCGA dataset using the Case Explorer:

  • Identify cases with TP53 mutations

- Browse and select using the Data Browser:

  • Create a new query:
    • Case: Prostate adenocarcinoma and White
    • File: Open and RNA Seq
    • Sample: Primary Tumor
  • Save query
  • Import files to CGC course project & tag files
  • Edit query:
    • Case: Prostate adenocarcinoma and White
    • File: Open and RNA-Seq
    • Sample: Solid tissue Normal
  • Import files to CGC course project & tag files

BASICS ON NEXT GENERATION SEQUENCING (NGS) APPLICATIONS AND THE CANCER GENOMICS CLOUD (CGC)

Estimated Time: 2 weeks
Learning Goals: To teach the user how to use available tools and workflows to analyze RNA-Seq data

Specific goals

- Query and import RNA-Seq data from CCLE and TCGA datasets
- Copy tools and workflows into existing projects
- Identify differentially expressed genes using public workflows

exercise activities

- Query CCLE data using the Data Browser
- Import files into the project
- Copy tools and workflows to the project:

  • RNA-Seq differential expression workflow/differential gene expression analysis between two groups or conditions

- Retrieve the reference files from public files. Import to project
- Run the three tools
- Export the results


Genomic analysis using whole-genome and whole-exome datasets

Estimated Time: 3 weeks
Learning Goals: To teach the user how to use available tools and workflows to analyze DNA sequencing data using WES and WGS data


Specific Goals

- Query and import WGS and WES data from CCLE and TCGA datasets
- Copy tools and workflows into existing projects
- Conduct variant calling (SNVs and CNVs) following GATK best practices using public workflows in the CGC Platform

Exercise activities

- Query CCLE data using the Data Browser:

  • Analyte type (DNA), Experiment strategy (WGS, WXS), Disease type (e.g. Breast Invasive Carcinoma, n=15 cells)
  • BAMs are available

- Import files into project
- Copy tools and workflows to the project:

  • SNV calling
    • VarScan2
  • CNV calling
    • CNVnator

- Retrieve the reference files from public files. Import to project
- Run the workflows and tools
- Export the results:

  • Output files:
    • VarScan2: VCF containing INDELs and SNPs
    • CNVnator outputs

advanced analytical topics/multiomics

Estimated Time: 3 weeks
Learning Goals: To teach the user how to conduct advanced analytics using the CGC Platform

specific goals

- Copy a nonpublic tool into their own projects
- Copy a set of nonpublic samples into their own projects
- Conduct GWAS analysis using the CGC Platform

exercise activities

Users will be invited as member only to copy the tool and the files to their own project.

- Copy the tool Plink into your own project
- Copy the samples to your own project
- Run the Plink tool
- Export the results:

  • Output files

Advanced features of the
cgc platform

Estimated Time: 2 weeks
Learning Goals: 

specific goals

By completing the exercise, the user will be able to:

exercise activities