Divya Sain Divya Sain

Release notes

New Command-line Uploader released

The new Command-line (CLI) Uploader, just released as part of the existing Seven Bridges CLI tool, becomes the primary recommended tool for performing large scale uploads to the CGC. The Uploader is easy to install and use, and is a resilient and performant command line application that provides users with a secure and reliable way of uploading data to the CGC.

The legacy Command line uploader will remain functional until August 2021, before being officially deprecated. Along with the legacy CLI Uploader, Desktop Uploader is also planned to be deprecated in August 2021, as Web Uploader is available through the CGC’s visual interface (since September 2020). Find out more about the new CLI Uploader in our documentation.

Recently published apps

GENESIS Update Null Model for Fast Score Test updates null model file obtained with the GENESIS Null model workflow so that it can be used in the GENESIS Single Variant Association Testing workflow in fast score mode.

Read More
Divya Sain Divya Sain

Release notes

CWL v1.2 available on the CGC

The CGC now supports Common Workflow Language (CWL) version v1.2. The new version of CWL brings a major new functionality - conditional execution of workflow steps, as well as several minor features and improvements. For the detailed change log please see the CWL CommandLineTool specification and the CWL Workflow specification.

The new CWL version v1.2 is a backwards-compatible upgrade of version v1.1, meaning all v1.0 and v1.1 features are still supported in v1.2. To upgrade a v1.0 or v1.1 app to v1.2, simply edit the app and the next version you save can automatically be upgraded to v1.2. Note that upgrading a workflow CWL version to v1.2 this way will not upgrade the CWL version of the tools in the workflow.

Apps using CWL v1.0 and v1.1 versions are still supported and can be used in workflows in combination with CWL v1.2 apps.

Read More
Divya Sain Divya Sain

Release notes

Network access control per Project available on the CGC

The CGC has added another layer of security protecting your data. Researchers can now choose from two options for controlling network access for each Project. This feature defines the network access permissions for both Tasks (tools and workflow executions) and Data Cruncher analyses (interactive analysis environments).

When setting up a project, users can choose to deny network access for all executions, thus ensuring even higher security and compliance standards in the execution environment provided by Seven Bridges. This restricted option will be the default selection for all new Projects. This additional security feature will enhance the safety of data during analysis in the cloud for all apps and notebooks. This change will not affect pulling of externally hosted Docker images or access to project files that point to externally hosted datasets, which means that access to common public datasets such as TCGA will not change. Access to the CGC API will also be available from the execution environment.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

The following apps were published in CWL1.x:

  • SRA Toolkit 2.10.8 - NCBI’s collection of tools and libraries for accessing data in Sequence Read Archives format (SRA).

  • SRA Download and Set Metadata a workflow that allows for downloading full SRA datasets and populating any metadata information that goes with the dataset

  • AnnotSV 3.0.7 - structural variant annotation and raking tool.

  • IsoformSwitchAnalyzeR 1.12.0 - a tool for differential splicing analysis, it performs statistical identification of the isoform switching while comparing two sample groups.

  • DRIMSeq 1.16.1 - performs differential transcript usage (DTU) analyses using Dirichlet-multinomial generalized linear models.

  • DEXSeq 1.36.0 - toolkit for testing differential exon usage in comparative RNA-Seq experiments.

  • Differential Exon Usage with DEXSeq 1.36.0 - a workflow constructed out of DEXSeq tools, meant for a comprehensive differential splicing analysis.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

The following apps were published in CWL1.x:

  • Single Cell Multi Sample Pairwise Differential Expression Workflow - pipeline that performs differential expression analysis on single cell data between pairs of user defined conditions.

  • Minimap2 v2.17 - a versatile sequence alignment program that aligns DNA or mRNA sequences against a large reference database, tailored for use with long read sequencing technologies.

  • fastqValidator 0.1.1 - checks format correctness of paired-end and single-end FASTQ files.

  • FastP 0.20.1 - ultra-fast FASTQ preprocessor with useful quality control and data-filtering features, including adapter trimming, quality filtering, per-read quality pruning and many other operations with a single scan of FASTQ data.

  • SBG convert SRA/BAM to FASTQ - an all-in-one tool that converts SRA/SAM/BAM/CRAM files into FASTQ format.

  • SBG Create Expression Matrix - creates aggregated matrices from various types of inputs, most typically from abundance estimates produced by tools like RSEM, Salmon, or Kallisto.

  • SHAPEIT 4.2.1 - phasing tool for sequencing and SNP array data.

  • Regenie 2.0.1 - tool for whole genome regression analysis.

  • UMI-tools 1.1.1 - tools for dealing with Unique Molecular Identifiers (UMIs)/Random Molecular Tags (RMTs) and single cell RNA-Seq cell barcodes.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

MaxQuant is a tool for quantitative proteomics, designed for analysing large mass-spectrometric data. It takes files with high-resolution, quantitative MS data and produces information about quantification of proteins and PTMs. It can be used for analysing data derived from any major relative quantification techniques (Label-free quantification (LFQ), MS1-level labelling and isobaric MS2-level labelling). Furthermore, it provides quantification algorithms for all common forms of tandem mass (TMT) and isobaric tags for relative and absolute quantitation (iTRAQ) labelling (including higher-plex TMT and multinotch MS3 quantification).

GENESIS Association Results Plotting creates Manhattan and QQ plots from GENESIS association test results with additional filtering and stratification options available. This app with it’s default options is the part of a GENESIS Association testing workflows, however after the association testing is completed users can fine-tune the Manhattan and QQ plots by running this app separately.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

The following apps were upgraded to CWL1 and had their versions updated as well:

  • GATK

  • Picard

  • VEP toolkit and workflow

Read More
Divya Sain Divya Sain

Release notes

Foundation Medicine data available on the CGC

Foundation Medicine dataset has been made available and is accessible through the Data Browser on the CGC. The dataset contains genomic profiling data from approximately 18,000 adult patients with a diverse array of cancers that underwent genomic profiling.

Read More
Divya Sain Divya Sain

Release notes

GDC Datasets version update

As of March 17, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 28.0.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

  • GATK Somatic SNVs and INDELs (Mutect2) 4.1.9.0 can be used to detect SNVs and INDELs in one or more tumor samples from a single individual, with or without a matched normal sample. Assembly implies whole haplotypes and read pairs, rather than single bases, as the atomic units of biological variation and sequencing evidence, improving variant calling.

  • GATK Somatic Create Mutect2 Panel of Normals 4.1.9.0 workflow creates a panel of normals (germline and artifactual sites) for use in other GATK workflows. It takes multiple normal sample callsets produced by GATK Somatic SNVs and INDELs 4.1.9.0 (Mutect2 workflow) tumor-only mode (although it is called tumor-only, normal samples are given as the input) and collates sites present in two or more samples into a sites-only VCF.

Both workflows are composed in reference to the official GATK’s WDLs.

Read More
Divya Sain Divya Sain

Release notes

Improved project organization with project tags

In order to improve the organization and findability of projects, project tags have been introduced to the CGC.

Project Admins can now assign tags to projects via the API or through the visual interface. Such tags can be used for filtering purposes when browsing all projects, for projects categorization, and for general custom organization of projects.

The maximum number of tags for a single project is 15, while the maximum number of characters in a single tag is 36.

PDC data update on the CGC

PDC data on the CGC has been updated with the following PDC Data Releases:

  • V1.0.24 (February 5, 2021)

  • V1.0.22 (January 5, 2021)

  • V1.0.21 (December 15, 2020)

See more information about the history and contents of each PDC data update on the CGC.

GDC Datasets version update

As of February 22, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 27.0.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

The following tools were updated to their latest versions and upgraded to CWL1.x:

  • HISAT2-StringTie workflow

  • StringTie

  • Hisat2

  • Trimmomatic

  • Tabix

  • SBG FASTQ Merge


The following new apps were published, in CWL1.x:

  • Exomiser 12.1.0 - tool for prioritizing variants from WES and WGS data.

  • VEP Slivar Trios Rare Diseases Analysis workflow - analyzes WES and WGS family variants.

  • Clustering and Gene Marker Identification with Seurat 3.2.2 - clustering and gene marker identification analysis starting from gene-cell UMI or read counts.

  • xCell 1.3 - tool for cell type enrichment analysis, which takes gene expression data and performs analysis for 64 immune and stromal cell types.

  • MBASED 1.18.0 tool - used for performing allele specific expression analysis.

  • MBASED workflow - based on the MBASED tool, with added phasing and VEP annotation, the workflow allows for easier running of allele specific expression analysis.

  • elPrep 4.1.6 - high-performance tool for preparing SAM/BAM files for variant calling in sequencing pipelines, which can be used as a replacement for SAMtools and Picard for preparation steps such as filtering, sorting, marking duplicates, calculating and applying base quality score recalibration, etc.

  • Kraken2 2.0.9 - taxonomic sequence classifier that assigns taxonomic labels to DNA sequences.

  • Bracken 2.5 - uses the taxonomic assignments made by Kraken/Kraken2, along with information about the genomes themselves, to estimate abundance at the species/genus level, or above.

Read More
Divya Sain Divya Sain

Release notes

RAS-CRDC Integration Phase 1 completed

The Researcher Auth Service (RAS), sponsored by The Office of Data Science Strategy, is a service provided by NIH's Center for Information Technology (CIT) to facilitate access to NIH’s open and controlled data assets and repositories in a consistent and user-friendly manner.

The RAS initiative is advancing data infrastructure and ecosystem goals defined in the NIH Strategic Plan for Data Science. RAS has adopted the Global Alliance for Genomics and Health (GA4GH) standards for integration of researcher-focused applications and data repositories over the OIDC platform.

The goal for this effort is to coordinate all cloud stacks and use RAS identically across systems. The NCI CRDC (Cancer Research Data Commons) stack was chosen for the pilot phase to create a phased approach that should achieve the larger goals of federated data access using GA4GH Passports, with a focus on how this fits in with NIH data in general.

Phase 1 is now completed introducing a change to the login flow when using eRA Commons:

  • When choosing login with eRA Commons on the CGC, you will now be redirected to the NIH RAS login screen instead of iTrust.

  • Other than the login flow change, user experience on the CGC remains the same.

Recently published apps

  • GATK Broad Best Practice Variant Calling From uBAM - This workflow presents two different BROAD Best Practice workflows incorporated into one - BAM processing and variant calling.

  • Functional Equivalence WGS - This workflow processes WGS data according to the functional equivalence standard.

Read More
Divya Sain Divya Sain

Release notes

New password validation rules

In order to maintain a high level of security and prevent unauthorized access to the CGC, we are introducing password checking against a database of commonly used passwords and those that were compromised in data breaches across the Internet, which is why they are considered unsafe. If the entered password exists in this database, you will need to use a different one.

Additionally, please note that the entire password validation process takes place within the Seven Bridges infrastructure, which ensures an additional level of security as there are no third-party services involved. This password validation mechanism applies when trying to perform the following actions:

  • Sign up for a new account.

  • Set up a new password to replace an expired one.

  • Change the account password.

Please note that this does not apply to accounts using external login providers.

Read More
Divya Sain Divya Sain

Release notes

Task queueing improvements

We have made the following changes to the task queueing process that should improve the queueing logic and contribute to faster completion of initialized tasks and analyses:

  • If you reach your parallel instance limit with running tasks, and there are both tasks and Data Cruncher analyses waiting in the queue, Data Cruncher analyses will be first to execute once instances become available.

  • If the parallel instance limit is reached with tasks that are being executed, when an instance becomes available it will first be allocated to running multi-instance or scattered tasks if they need additional instances. If there are no such tasks, the instance will be allocated to other task(s) that are next up in the queue. This will enable faster completion of multi-instance and scattered tasks and help avoid breaks in their execution.

Recently published apps

GENESIS LocusZoom visualizes association testing results using the LocusZoom standalone software. This app is a wrapper around LocusZoom standalone software to enable it to work with outputs of GENESIS association pipelines. The main goal of this app is to visualize results of GENESIS Single Variant Association Test, however regions from sliding window or aggregate tests with p-values below a certain threshold can be displayed in a separate track.

SBG Loci Snapshoter generates screenshots of specific regions across all aligned files provided as inputs. It utilizes the IGV batch functionality to create PNG images of desired loci across multiple samples. The main driver of developing this tool was doing a post-association visualisation of associated variants across a subset of CRAM files used to obtain those variants.

Read More
Divya Sain Divya Sain

Release notes

CDS integration on the CGC

The Cancer Data Service (CDS) is a data repository under the NCI's Cancer Research Data Commons (CRDC) infrastructure for storing cancer research data generated by NCI funded programs. Its data is stored in the Database for Genotypes and Phenotypes (dbGaP) database provided by the National Center for Biotechnology Information (NCBI). CDS hosts datasets that contain controlled access data, with access permissions being controlled by dbGaP. This release brings 3 CDS datasets to the CGC: GECCO, PPTC and LCCC-1108 and enables researchers to easily use CDS data on the CGC.

Read More
Divya Sain Divya Sain

Release notes

Data Cruncher Stability and Usability Improvements

Your experience with Data Cruncher just got better thanks to the following improvements:

  • Use the full potential of RStudio as it is now officially out of the BETA stage and its stable release is available in Data Cruncher.

  • Maintain full control over your workspace integration capabilities in a more secure environment - your Data Cruncher sessions are now run on a separate domain providing even better security isolation and privacy control of your favorite third-party integrated development environments.

  • Have a better insight into your session initialization phase with a more informative loading experience.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

Strelka2 Somatic workflow and Strelka2 Germline tool have been published to the CGC in CWL1.0. Strelka2 is a fast and accurate small variant caller optimized for analysis of germline variation in small cohorts and somatic variation in tumor/normal sample pairs. For better calling results structural variant caller, Manta has been added to the Somatic Workflow.

Read More
Divya Sain Divya Sain

Release notes

Upload files directly through the visual interface

Uploads just became way easier! In order to enable easy and convenient small-scale file uploads for our users, we have added a new functionality that allows you to upload files directly through the CGC's visual interface. To upload files from your local storage, navigate to the desired project and click the Add files button. You will notice a new tab called Your Computer, which will, apart from file upload itself, provide all standard upload-related features such as naming conflicts resolution, file tagging and tracking of upload progress. Learn more.

Read More
Divya Sain Divya Sain

Release notes

GA4GH WES and DRS support

Through its engagement in GA4GH (The Global Alliance for Genomics and Health), Seven Bridges actively works with platform development partners and industry leaders to develop standards that will facilitate interoperability.

The GA4GH Cloud Work Stream helps the genomics and health communities take full advantage of modern cloud environments. Its initial focus is on 'bringing the algorithms to the data', by creating standards for defining, sharing, and executing portable workflows. Standards under discussion include workflow definition languages, tool encapsulation, cloud-based task and workflow execution, and cloud-agnostic abstraction of data access.

CGC provides support for the following standards:

WES API

The Workflow Execution Service (WES) API describes a standard programmatic way to run and manage workflows. Having this standard API supported by multiple execution engines will let people run the same workflow using various execution platforms running on various clouds/environments.

The following API paths are available as a part of the Seven Bridges implementation of WES API:

Learn more

DRS API - AuthN/Z Update

The Data Repository Service (DRS) API provides a generic interface to data repositories so data consumers, including workflow systems, can access data in a single, standard way regardless of where it's stored and how it's managed.

With this release, authN/Z method is changed to better reflect specification recommendations:

All API requests need to have the HTTP header X-SBG-Auth-Token which you should set to your authentication token.

The following API paths are available as part of the Seven Bridges implementation of DRS API:

Learn more

Recently published apps

The updated GENESIS apps are now available in our public apps gallery. The new release includes:

  • New Docker image v.2.8.1.

  • Updated input and output descriptions.

  • Comprehensive benchmarking included in the apps description.

  • Standard output included in the task logs.

  • Other minor changes.

Read More