Release notes

Added support for Amazon EC2 P3 GPU Instances

We have added support for Amazon P3 GPU instance family to the CGC. Amazon EC2 P3 instances deliver high performance compute in the cloud with up to 8 NVIDIA® V100 Tensor Core GPUs and up to 100 Gbps of networking throughput. These instances deliver up to one petaflop of mixed-precision performance per instance to significantly accelerate machine learning and high performance computing applications.

Release notes

CGC meets Dockstore

Now you can import CWL workflows from Dockstore.org with a single click. Dockstore is an open platform for sharing Docker-based apps described with the Common Workflow Language (CWL), Workflow Description Language (WDL) or Nextflow, which enables bioinformaticians to share analytical tools that can be executed in a compliant execution environment, such as the CGC. This integration should allow users to have streamlined interoperability between the two platforms without the need to manually port apps by exporting and importing CWL code. Learn more.

Define Compute Resources per Task Run

When creating a task via visual interface, you are now able to set top level instance type and max number of parallel instances for your execution without having to create a new version of the app. Learn more about setting execution hints on task level from our documentation.

Release notes

Human Cell Atlas Preview Datasets Public Project

Human Cell Atlas Preview Datasets are now available as a public project on the CGC. The project contains files released to the research community within the first three single-cell sequencing datasets as “Human Cell Atlas Preview Datasets”. The available datasets are:

  • Census of Immune Cells

  • Ischaemic Sensitivity of Human Tissue

  • Melanoma Infiltration of Stromal and Immune Cells

Release notes

Access task secondary files via the API

You can now use our sevenbridges-python client to access secondary files for task inputs and outputs.

New and improved functionality:

  1. API users can now see exactly which files were used as secondary files for inputs.

  2. Python client can now easily get those files via a simple call, as shown in the example below.

  3. All of this is also supported for CWL 1.x tools and workflows, where the secondary files can be defined as JS expressions.

Whole Genome Sequencing - Quality Control - CWL1.0 Workflow

Whole Genome Sequencing - Quality Control - CWL1.0 Workflow is intended as a general-purpose QC flow for users processing WGS data, regardless of the number of samples. It should offer plots which can be easily visually inspected by the end users, as well as structured data output suitable for aggregation and parsing in an automated setup.

Release notes

Export files to a volume within the same region

It is now possible to mount volumes from all supported cloud providers and regions in read-write (RW) mode on the CGC. File export is possible to volumes that are in the same location (cloud provider and region) as the file that is being exported, which prevents additional data transfer costs to be caused by the export procedure.

Release notes

Import Files from the PDC

We have implemented an additional file import system that allows you to import proteomic data into your projects using manifest files generated on the Proteomic Data Commons (PDC) Data Portal. The process consists of two stages:

  • generating a manifest file for the selected data on the PDC Data Portal; and

  • importing the selected data into a project on the CGC, with the help of the generated manifest file.

The process currently works with CPTAC3 data only, as indicated in our documentation

We have also made PDC CPTAC3 metadata available on the CGC, as a single JSON file available in the Public Files gallery. For more details on how to find the file and use the metadata, please read our short tutorial.

Release notes

Recently published apps

We published two additional GDC workflows on the CGC - mRNA Analysis pipeline and Tumor-only Variant Calling pipeline.

The mRNA pipeline performs quantification analysis on raw RNA-Seq data (FASTQs or unmapped BAM files) with STAR for alignment and HTseq for counting.

The Tumor-only Variant Calling workflow utilizes GATK4's MuTect2 to call variants on tumor samples. The workflow is used for for harmonization of genomic data for datasets such as The Cancer Genome Atlas (TCGA).

Release notes

GDC Datasets version update

As of July 10, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 17.

CPTAC-3 data release

With this release we will have controlled access data from the CPTAC-3 project available on the CGC for search and filtering in the Data Browser and through the API.

Release notes

Supported browsers update

Internet Explorer is no longer a supported browser on the Cancer Genomics Cloud. When trying to access the CGC using Internet Explorer, you will be presented with an adequate explanatory message stating that you are using an unsupported browser and suggesting that you switch to a supported one.

We have also updated the minimum required versions for the supported browsers:

Release notes

Writing rate limit-efficient API scripts

We put new documentation online that helps you making your API scripts rate limit-efficient. Code snippets demonstrate recommended use of the Seven Bridges Python client to minimize API calls for common tasks, including finding projects, iterating over result sets of queries, importing files from volumes, exporting files to volumes, updating file metadata, copying files between projects, deleting files, and submitting tasks for execution.

Release notes

Support for Google Cloud Preemptible Instances (beta)

In line with availability of Spot Instances for AWS-based projects, we are now introducing support for Google Cloud Preemptible Instances in projects that are based in a Google Cloud location. As with AWS Spot Instances, Preemptible instances can also significantly reduce the cost of your task executions as they are the cloud provider’s spare capacity that is offered at lower prices than regular on-demand instances.

Learn more from our documentation.

Release notes

Multi-cloud

If you store your files in AWS US East (N. Virginia) and/or GCP US West (Oregon) regions, the CGC now allows you to manage all your work from a single space and spin up chosen computation resources at the location where your data lives.

New CWL web editor is now live

We have released an updated version of our CWL web editor. This release integrates the functionality of our desktop editor, Rabix Composer, with the CGC.

Release notes

Recently published apps

BROAD Best Practices RNA-Seq

This workflow represents the GATK Best Practices for SNP and INDEL calling on RNA-Seq data. Starting from an unmapped BAM file, it performs alignment to the reference genome, followed by marking duplicates, reassigning mapping qualities, base recalibration, variant calling and variant filtering. We used Broad’s best practice script in WDL format as a reference to create the BROAD Best Practices RNA-Seq Variant Calling 4.1.0.0 workflow in CWL version 1.0.

BROAD Best Practices Somatic CNV Panel Workflow

BROAD Best Practices Somatic CNV Panel Workflow is used for creating a panel of normals (PON) given a group of normal samples. Using read coverage collected over specified intervals, this workflow creates a panel of normals HDF5 file which is used in BROAD Best Practices Somatic CNV Pair Workflow for standardizing and denoising read counts. This workflow represents a CWL implementation of Broad’s best practice CNV panel WDL workflow.

BROAD Best Practices Somatic CNV Pair Workflow

BROAD Best Practices Somatic CNV Pair Workflow is used for detecting copy number variants (CNVs) as well as allelic segments. Given a tumor and optional matched normal sample, as well as panel of normals (PON) file, this workflow models and calls CNV segments. This workflow represents a CWL implementation of Broad’s best practice CNV pair WDL workflow.

Release notes

Memoization (beta)

When defining task execution settings, you can now enable memoization. Achieve significant time and cost optimization of your project workload by letting the CGC reuse existing results of your previous runs. Memoization can be enabled at project or task level, where the task-level setting overrides the project-level one.

Multiple datasets selection for querying in Data Browser

With multiple dataset selection for simultaneous querying, you are now able to start data querying in Data Browser by selecting more than one dataset.

Improved organization of Public Reference Files

The Public Reference Files gallery has been renamed into Public Files and split into two categories, Public Reference Files and Public Test Files, where the former holds all common reference files, while the latter contains common test samples.

Recently published apps

GDC DNASeq Harmonization Workflow

The GDC DNASeq Harmonization Workflow is developed by the National Cancer Institute's Genomic Data Commons. It is used for harmonization of genomic data for datasets such as The Cancer Genome Atlas (TCGA) and is publicly available on the CGC.

Release notes

Added support for GPU Instances

The first family of GPU instances we’re introducing is Amazon EC2 P2. P2 Instances are powerful, scalable instances that provide GPU-based parallel compute capabilities. Designed for general-purpose GPU compute applications using CUDA and OpenCL, these instances are ideally suited for machine learning, molecular modeling, genomics, rendering, and other workloads requiring massive parallel floating point processing power.

Release notes

Spot Instances enabled by default on project creation

In order to promote execution cost optimization, Spot Instances are now enabled by default when creating a new project through the visual interface or the API, unless you have specifically set otherwise. This setting can later be changed from the project settings page, or overridden per task on the draft task page. Learn more about Spot Instances on the CGC.

Support for asynchronous bulk actions through the API

As a part of adding full support for folders and improving scalability, we have introduced asynchronous file system actions through the API. Currently supported actions are copy and delete, and these are enabled for both files and folders. There are five new API endpoints for async bulk actions which can be used for issuing copy and delete commands and for getting job statuses.

Improved layout of the draft task page

In order to streamline the preparation process for task execution, both file inputs and app settings will now be available as two columns under the same tab named Task Inputs on the draft task page. Spot Instance configuration will be moved to the second tab on the draft task page, named Execution Settings. This tab will also serve as the central and unique location for all settings related to task execution that will be added in the future.

Release notes

Recently published apps

Metagenomics WGS Functional Profiling - HUMAnN2

HUMAnN2 (the HMP Unified Metabolic Analysis Network) is a tool used for efficiently and accurately determining the presence/absence and abundance of metabolic pathways in a microbial community from metagenomic sequencing data. It introduces a novel tiered search algorithm that provides highly accurate profiles for characterized members of microbial communities, with fallback to translated search for uncharacterized members.

Metagenomic WGS Functional Profiling - HUMAnN2 workflow provides a complete functional profiling analysis of input samples, designed to analyze several metagenomics samples in parallel.