Release notes
Access task secondary files via the API
You can now use our sevenbridges-python client to access secondary files for task inputs and outputs.
New and improved functionality:
API users can now see exactly which files were used as secondary files for inputs.
Python client can now easily get those files via a simple call, as shown in the example below.
All of this is also supported for CWL 1.x tools and workflows, where the secondary files can be defined as JS expressions.
Whole Genome Sequencing - Quality Control - CWL1.0 Workflow
Whole Genome Sequencing - Quality Control - CWL1.0 Workflow is intended as a general-purpose QC flow for users processing WGS data, regardless of the number of samples. It should offer plots which can be easily visually inspected by the end users, as well as structured data output suitable for aggregation and parsing in an automated setup.
Release notes
Export files to a volume within the same region
It is now possible to mount volumes from all supported cloud providers and regions in read-write (RW) mode on the CGC. File export is possible to volumes that are in the same location (cloud provider and region) as the file that is being exported, which prevents additional data transfer costs to be caused by the export procedure.
Release notes
GDC DATASETS VERSION UPDATE
As of August 7, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 18.
Release notes
Import Files from the PDC
We have implemented an additional file import system that allows you to import proteomic data into your projects using manifest files generated on the Proteomic Data Commons (PDC) Data Portal. The process consists of two stages:
generating a manifest file for the selected data on the PDC Data Portal; and
importing the selected data into a project on the CGC, with the help of the generated manifest file.
The process currently works with CPTAC3 data only, as indicated in our documentation.
We have also made PDC CPTAC3 metadata available on the CGC, as a single JSON file available in the Public Files gallery. For more details on how to find the file and use the metadata, please read our short tutorial.
Release notes
Recently published apps
We published two additional GDC workflows on the CGC - mRNA Analysis pipeline and Tumor-only Variant Calling pipeline.
The mRNA pipeline performs quantification analysis on raw RNA-Seq data (FASTQs or unmapped BAM files) with STAR for alignment and HTseq for counting.
The Tumor-only Variant Calling workflow utilizes GATK4's MuTect2 to call variants on tumor samples. The workflow is used for for harmonization of genomic data for datasets such as The Cancer Genome Atlas (TCGA).
Release notes
GDC Datasets version update
As of July 10, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 17.
CPTAC-3 data release
With this release we will have controlled access data from the CPTAC-3 project available on the CGC for search and filtering in the Data Browser and through the API.
Release notes
Supported browsers update
Internet Explorer is no longer a supported browser on the Cancer Genomics Cloud. When trying to access the CGC using Internet Explorer, you will be presented with an adequate explanatory message stating that you are using an unsupported browser and suggesting that you switch to a supported one.
We have also updated the minimum required versions for the supported browsers:
Release notes
Recently published apps
The following apps have been ported to CWL 1.0 and are now available as CWL 1.0 apps in the Public Apps gallery:
Optitype 1.2
VEP annotation workflow 90.5
Ensembl-VEP 90.5
Release notes
Writing rate limit-efficient API scripts
We put new documentation online that helps you making your API scripts rate limit-efficient. Code snippets demonstrate recommended use of the Seven Bridges Python client to minimize API calls for common tasks, including finding projects, iterating over result sets of queries, importing files from volumes, exporting files to volumes, updating file metadata, copying files between projects, deleting files, and submitting tasks for execution.
Release notes
Support for Google Cloud Preemptible Instances (beta)
In line with availability of Spot Instances for AWS-based projects, we are now introducing support for Google Cloud Preemptible Instances in projects that are based in a Google Cloud location. As with AWS Spot Instances, Preemptible instances can also significantly reduce the cost of your task executions as they are the cloud provider’s spare capacity that is offered at lower prices than regular on-demand instances.
Learn more from our documentation.
Release notes
Supported instances update
You can now use next generation AWS Memory Optimized instances (R5) in task executions and Data Cruncher analyses. R5 instances support the high memory requirements of certain applications to increase performance and reduce latency.
Learn more about supported instance types.
Release notes
Multi-cloud
If you store your files in AWS US East (N. Virginia) and/or GCP US West (Oregon) regions, the CGC now allows you to manage all your work from a single space and spin up chosen computation resources at the location where your data lives.
New CWL web editor is now live
We have released an updated version of our CWL web editor. This release integrates the functionality of our desktop editor, Rabix Composer, with the CGC.
Release notes
Recently published apps
BROAD Best Practices RNA-Seq
This workflow represents the GATK Best Practices for SNP and INDEL calling on RNA-Seq data. Starting from an unmapped BAM file, it performs alignment to the reference genome, followed by marking duplicates, reassigning mapping qualities, base recalibration, variant calling and variant filtering. We used Broad’s best practice script in WDL format as a reference to create the BROAD Best Practices RNA-Seq Variant Calling 4.1.0.0 workflow in CWL version 1.0.
BROAD Best Practices Somatic CNV Panel Workflow
BROAD Best Practices Somatic CNV Panel Workflow is used for creating a panel of normals (PON) given a group of normal samples. Using read coverage collected over specified intervals, this workflow creates a panel of normals HDF5 file which is used in BROAD Best Practices Somatic CNV Pair Workflow for standardizing and denoising read counts. This workflow represents a CWL implementation of Broad’s best practice CNV panel WDL workflow.
BROAD Best Practices Somatic CNV Pair Workflow
BROAD Best Practices Somatic CNV Pair Workflow is used for detecting copy number variants (CNVs) as well as allelic segments. Given a tumor and optional matched normal sample, as well as panel of normals (PON) file, this workflow models and calls CNV segments. This workflow represents a CWL implementation of Broad’s best practice CNV pair WDL workflow.
Release notes
Memoization (beta)
When defining task execution settings, you can now enable memoization. Achieve significant time and cost optimization of your project workload by letting the CGC reuse existing results of your previous runs. Memoization can be enabled at project or task level, where the task-level setting overrides the project-level one.
Multiple datasets selection for querying in Data Browser
With multiple dataset selection for simultaneous querying, you are now able to start data querying in Data Browser by selecting more than one dataset.
Improved organization of Public Reference Files
The Public Reference Files gallery has been renamed into Public Files and split into two categories, Public Reference Files and Public Test Files, where the former holds all common reference files, while the latter contains common test samples.
Recently published apps
GDC DNASeq Harmonization Workflow
The GDC DNASeq Harmonization Workflow is developed by the National Cancer Institute's Genomic Data Commons. It is used for harmonization of genomic data for datasets such as The Cancer Genome Atlas (TCGA) and is publicly available on the CGC.
Release notes
Added support for GPU Instances
The first family of GPU instances we’re introducing is Amazon EC2 P2. P2 Instances are powerful, scalable instances that provide GPU-based parallel compute capabilities. Designed for general-purpose GPU compute applications using CUDA and OpenCL, these instances are ideally suited for machine learning, molecular modeling, genomics, rendering, and other workloads requiring massive parallel floating point processing power.
Release notes
Spot Instances enabled by default on project creation
In order to promote execution cost optimization, Spot Instances are now enabled by default when creating a new project through the visual interface or the API, unless you have specifically set otherwise. This setting can later be changed from the project settings page, or overridden per task on the draft task page. Learn more about Spot Instances on the CGC.
Support for asynchronous bulk actions through the API
As a part of adding full support for folders and improving scalability, we have introduced asynchronous file system actions through the API. Currently supported actions are copy and delete, and these are enabled for both files and folders. There are five new API endpoints for async bulk actions which can be used for issuing copy and delete commands and for getting job statuses.
Improved layout of the draft task page
In order to streamline the preparation process for task execution, both file inputs and app settings will now be available as two columns under the same tab named Task Inputs on the draft task page. Spot Instance configuration will be moved to the second tab on the draft task page, named Execution Settings. This tab will also serve as the central and unique location for all settings related to task execution that will be added in the future.
Release notes
Recently published apps
Metagenomics WGS Functional Profiling - HUMAnN2
HUMAnN2 (the HMP Unified Metabolic Analysis Network) is a tool used for efficiently and accurately determining the presence/absence and abundance of metabolic pathways in a microbial community from metagenomic sequencing data. It introduces a novel tiered search algorithm that provides highly accurate profiles for characterized members of microbial communities, with fallback to translated search for uncharacterized members.
Metagenomic WGS Functional Profiling - HUMAnN2 workflow provides a complete functional profiling analysis of input samples, designed to analyze several metagenomics samples in parallel.
Release notes
Data Cruncher - RStudio (beta)
In addition to JupyterLab, Data Cruncher now supports one more development environment, RStudio. You can choose between the two environments when setting up your Data Cruncher analysis.
Also, file saving rules have been deprecated, so all analysis files will be automatically saved in your analysis workspace on the CGC, regardless of their size or extension.
Learn more about Data Cruncher and the available environments from our documentation.
Release notes
Updates to the TCGA, TARGET and CCLE datasets
As part of Seven Bridges' ongoing partnership with the National Cancer Institute (NCI), authorized researchers can access valuable public datasets generated by the TCGA, TARGET, and CCLE initiatives through the CGC. Seven Bridges collaborates with the NCI Genomic Data Commons (GDC) on an ongoing basis to ensure alignment between the datasets available through the GDC and the CGC. In keeping with this, updated versions of the TCGA, TARGET, and CCLE datasets have been released on the CGC. As of February 11, the legacy TCGA and CCLE datasets available through the CGC are fully aligned with those in the GDC Legacy Archive, and the TCGA GRCh38 and TARGET GRCh38 datasets are fully aligned with GDC Data Release 14.0.
Release notes
Folders as task inputs and outputs
When selecting inputs for a task, you will now be able to select an entire folder for input ports that are set up to take folders as input values. This means that such input ports will take all files from the root of the selected folder and its subfolders. Folders can now also be displayed as app outputs, provided that the app itself is configured to produce output data in folder(s). This feature is available for CWL 1.0 apps only.