Release notes

Recently published apps

We have just published the updated 0.7.17 version of BWA MEM Bundle, a well-known tool designed for aligning sequence reads onto a large reference genome, and BWA INDEX, used for indexing the reference sequence as a prior step required for BWA MEM Bundle. Both tools are published in CWL1.2.

Read More
Divya Sain
Release notes

Recently published apps

We have published the following apps in our Public Apps gallery:

  • Cyrius (v1.1.1, CWL1.2), a tool that genotypes CYP2D6 in WGS data. It takes WGS BAM or CRAM files and creates a TSV report with CYP2D6 alleles.

  • Two PharmCAT (v1.6.0, CWL1.2) tools:

    • PharmCAT VCF Preprocess is a tool that takes a VCF file and prepares it for downstream processing with PharmCAT, and

    • PharmCAT, a tool for interpreting guideline variants in VCF files.

  • Two Biobambam2 (v2.0.183, CWL1.2) tools:

    • Biobambam2 Bamtofastq that converts BAM/CRAM/SAM files to FASTQ format, and

    • Biobambam2 Bamseqchksum - tool for calculating hashes for the contents of the provided alignments file.

Read More
Divya Sain
Release notes

Recently published apps

We have just published the following apps:

  • An updated version of the SRA Download and Set Metadata workflow (SRA Toolkit 3.0.0) that downloads metadata associated with SRA accession via SRA Run Info CGI, (on-demand instance) FASTQ files and sets corresponding metadata.

  • OptiType (v1.3.5, CWL1.2), a tool designed for precision HLA typing from next-generation sequencing data. It is based on the assumption that the correct HLA genotype explains the highest number of mapped reads.

  • fastENLOC (v1.0, CWL1.2), a tool that enables integrative genetic association analysis of molecular QTL data and GWAS data.

Read More
Divya Sain
Release notes

Recently published apps

We have just published the following apps in our Public Apps gallery:

  • TwoSampleMR, a tool that performs Mendelian randomization testing for a given exposure-outcome pair. It is a wrapper around the TwoSampleMR R package and uses summary statistics data for making causal inference.

  • CCS, a tool that combines multiple subreads of the same SMRTbell molecule and outputs one highly accurate consensus sequence.

  • lima, a tool used with PacBio single-molecule sequencing data for barcode and primer sequences identification.

  • PacBio Flowcell Data Processing, a workflow that can be used to process PacBio CCS or CLR data in preparation for variant calling.

  • PacBio CCS or CLR WGS Variant Calling workflow that can be used to call structural variants in PacBio CCS or CLR data. The workflow can also call small variants in CCS data using Clair3.

Read More
Divya Sain
Release notes

Recently published apps

We’ve just published AnnotationDbi select and mapIds, a tool that maps one type of IDs to another. It is based on Bioconductor annotation data packages.

Read More
Divya Sain
Release notes

Recently published apps

New apps have been added to the CGC:

  • Two tools from the Samplot toolkit:

    • Samplot Plot takes alignment files and coordinates for a region containing the SV call of interest (Chromosome, Start position, and End position) and creates a plot of the SV region.

    • Samplot Vcf can be used to create visualizations of structural variant calls from a VCF file.

  • Seven tools from the Smoove toolkit:

    • Smoove Annotate annotates SV calls with SV quality and gene information from GFF3 files.

    • Smoove Call calls structural variants with Lumpy and optionally calls svtyper.

    • Smoove Duphold annotates SV calls in the file based on information from the provided alignment files.

    • Smoove Genotype runs svtyper in parallel on provided SV inputs.

    • Smoove Merge merges SV calls from individual files with SV calls and sorts them using svtools.

    • Smoove Paste squares matching SV calls from individual files to a single joint file with final calls.

    • Smoove Plot-counts takes a VCF file created by other Smoove tools and plots counts of split and discordant reads before and after filtering.

  • Upgraded four Sambamba tools to 0.8.1 (and CWL 1.2) and added three new tools:

    • Sambamba Flagstat generates statistics from read flags in a BAM file.

    • Sambamba Index creates a BAI or FAI index for the provided input.

    • Sambamba Markdup can be used to mark or remove duplicate reads from an input BAM file.

    • Sambamba Merge merges alignments in BAM format.

    • Sambamba Slice can be used to copy a slice (region) of the coordinate sorted and indexed input file in BAM or FASTA format.

    • Sambamba Sort sorts alignments in BAM format.

    • Sambamba View accepts alignments in BAM or SAM format and outputs data in a user-specified format.

Read More
Divya Sain
Release notes

GDC Datasets version update

As of March 11, 2022, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 31.

Recently published apps

We have added four apps to our public apps gallery:

  • Single cell RNA-seq velocity analysis with scVelo 0.2.4 workflow that performs preprocessing, marker gene analysis, and velocity analysis of single-cell expression data. It is based on SingleCellExperiment, Seurat, scran, scater, AnnotationHub, scuttle, and scVelo.

  • Velocyto.py - Velocyto 0.17.17 is a package for the analysis of expression dynamics in single cell RNAseq data. In particular, it enables estimations of RNA velocities of single cells by distinguishing unspliced and spliced mRNAs in standard single-cell RNA sequencing protocols. Velocyto.py is a command line tool (distributed with the package) that is used to generate spliced/unspliced count matrices.

  • SBG single cell object convertor tool that performs conversion of single cell data object type for commonly used formats: Seurat, AnnotatedData, and SingleCellExperiment.

  • Single cell RNA-seq trajectory analysis with slingshot and tradeSeq, a tool that performs single cell trajectory analysis with slingshot 2.0.0, and differential expression testing on inferred trajectories with tradeSeq 1.6.0. Slingshot takes advantage of single cell data principal components analysis (PCA) and clustering to infer probable paths of cell development.

Read More
Divya Sain
Release notes

Support for Nextflow and WDL workflows available on the CGC

Apart from significant contributions from Seven Bridges team members to the development of the Common Workflow Language (CWL) and its extensive implementation on the CGC, we are now taking a step further and providing support for two more workflow description languages, Nextflow and WDL. This presents a groundbreaking move in the direction of enabling you to reduce the time needed to bring your apps to the CGC, eliminate the need for conversion of your Nextflow or WDL code, while still allowing you to use a better interface for running workflows and all other out-of-the-box features in the Seven Bridges ecosystem.

CDS data import updates

The latest update of the CDS data import functionality on the CGC removes the limitation of having to use a controlled data project as the target project for CDS data import. The use of controlled data projects is still required for successful importing of controlled data from the CDS, but open access CDS data can now be freely imported in open data projects on the CGC.

Read More
Divya Sain
Release notes

AWS i3 instances available on all environments

With this update you can use the newest Amazon EC2 I3 instances designed for data-intensive, high transaction, low latency workloads, offering the best price per I/O performance (I3) and the lowest price per GB of SSD instance storage on Amazon EC2 (I3en).

Recently published apps

We have published GATK RNAseq short variant discovery 4.2.0.0 workflow, which represents a CWL implementation of the official GATK best practices workflow given in WDL for RNASeq variant discovery. Starting from an unmapped BAM file, the workflow performs alignment to the reference genome, followed by marking of duplicates, reassigning of mapping qualities, base recalibration, variant calling, and variant filtering.

Read More
Divya Sain
Release notes

Recently published apps

We have published 10 tools from the GRIDSS module software suite (toolkit) containing tools useful for the detection of genomic rearrangements:

  • GRIDSS tool, a structural variation caller for Illumina sequencing data. It calls variants based on alignment-guided positional de Bruijn graph genome-wide break-end assembly, split read, and read pair evidence.

  • GRIDSS Extract Overlapping Fragments is used to extract reads of interest for targeted GRIDSS variant calling.

  • GRIDSS Annotate VCF Kraken2 adds Kraken2 classifications to single breakend and breakpoint inserted sequences.

  • GRIDSS Annotate VCF RepeatMasker adds RepeatMasker classifications to inserted sequences.

  • GRIDSS GeneratePonBedpe aggregates variants from multiple VCFs and counts the number of samples supporting each.

Read More
Divya Sain
Release notes

Recently published apps

We’ve just published OlinkAnalyze DE, a tool that performs differential expression analysis on Olink Normalized Protein eXpression (NPX) data, and OlinkAnalyze QC that generates a quality control and exploratory analysis report on Olink NPX data.

Read More
Divya Sain
Release notes

SBFS support for macFUSE 4.x

SBFS is a command-line tool which enables interaction with CGC project files that are mounted as a local file system. In order to use SBFS, it is necessary to have the FUSE component installed. While FUSE is a part of the Linux kernel, on macOS it is necessary to install FUSE for macOS (which is now called macFUSE) and we are now adding support for macFUSE version 4.x (macFUSE 4.0.0 was released in October 2020, and that is when the name was changed from “FUSE for macOS” to “macFUSE”, while its latest version is macFUSE 4.2.4). Please note that SBFS is available as a BETA tool. Also, it’s not available for the Windows operating system, but only for Linux and macOS.

Read More
Divya Sain
Release notes

Data Cruncher default environment update

The default environment for Data Cruncher interactive analyses has been updated to include more up-to-date versions of Python (upgraded to 3.9) and R (upgraded to 4.1).

GDC Datasets version update

As of December 17, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 30.0.

Read More
Divya Sain
Release notes

Recently published apps

  • GRIDSS/PURPLE/LINX Workflow, used for somatic genomic rearrangement detection and classification on WGS data. This workflow takes a pair of matched tumor/normal BAM files and produces allele-specific copy number of every base of the genome, overall sample purity and ploidy, annotated SV clusters and gene fusion predictions. Moreover, it outputs detailed visualisations of the rearrangements in the tumor genome via integrated Circos plots showing copy number changes, clustered SVs, derivative chromosome predictions and impacted genes.

  • PURPLE CNV Calling Workflow, used for somatic CNV calling and purity and ploidy estimation on WGS data. It is based on PURPLE 2.51, and consists of two additional tools - AMBER and COBALT. The workflow first calculates B-allele frequency (BAF) with AMBER and read depth ratios with COBALT, which is then used by PURPLE to estimate the purity, ploidy and copy number profile of a tumor sample.

Read More
Divya Sain
Release notes

Metadata editing using manifest files just got easier

CGC provides the capability to modify metadata for multiple files in a project by using the Export metadata manifest and Edit metadata with manifest options in the File Browser. This release brings some major improvements to this feature:

  • Support for different manifest file formats. Besides CSV, we have added support for the TSV file format.

  • Use either file name or ID to identify a file. Files whose metadata is being edited can be specified using only file ID or file name (along with path) in the manifest file used with the Edit metadata with manifest option.

  • Support for folders. The name column can contain file path within the project (along with the file name) if the file is in a folder instead of the project root.

Click Read more below to see the full list of improvements.

Recently published apps

We have just published and upgraded versions (from 2.17 to 2.22) of minimap2, a sequence alignment program that aligns DNA or mRNA sequences against a reference database, and minimap2 build index, a reference indexer for minimap2 aligner.

This week’s publishing streak also includes METAL, a tool for meta-analysis genome-wide association scans. METAL can combine either (a) test statistics and standard errors or (b) p-values across studies (taking sample size and direction of effect into account). A METAL analysis is a convenient alternative to a direct analysis of merged data from multiple studies.

Read More
Divya Sain
Release notes

Recently published apps

We have just published Picard FastqToSam, a tool that converts FASTQ files to an unaligned SAM or BAM file, and a set of seven Delly tools:

  • Delly CNV for calling copy-number variants

  • Delly Call, a structural variants caller

  • Delly LR, a structural variants caller for long reads data

  • Delly Sansa Annotate for annotating structural variants

  • Delly Classify for classifying somatic or germline copy-number variants

  • Delly Filter, a tool that filters structural variants

  • Delly Merge for merging of structural variants in BCF format

Read More
Divya Sain
Release notes

Recently published apps

We have just published the following apps:

  • CrossMap, a tool that converts genomic coordinates between different assemblies, and CrossMap Viewchain that prints the chain file for two assemblies in a human-readable format.

  • VerifyBamID2 that estimates contamination of DNA samples from read data, accounting for ancestry information.

Read More
Divya Sain
Release notes

Recently published apps

We have just published DRAGMAP, the open source DRAGEN mapper/aligner that can be used to align single or paired-end reads (FASTQ) or an input BAM file. The app is available in the Public Apps gallery.

Read More
Divya Sain
Release notes

Recently published apps

We have just updated the content of our public app galleries with new GATK releases:

  • GATK Pre-Processing For Variant Discovery 4.2.0.0 workflow is used to prepare data for variant calling analysis. The workflow consists of two major segments: alignment to reference genome and data cleanup operations that correct technical biases. Resulting BAM files are ready for variant calling analysis and can be further processed by other BROAD best practice pipelines, like Generic Germline Short Variant Per-Sample Calling workflow, Somatic CNVs workflow, and Somatic SNVs + INDELs workflow.

  • GATK Generic Germline Short Variant Per-Sample Calling 4.2.0.0 workflow that calls germline variants in a WGS sample with GATK HaplotypeCaller, starting from an analysis-ready BAM file.

And six GATK 4.2.0.0 tools:

  • GATK GatherBQSRReports tool that gathers scattered BQSR recalibration reports into a single file.

  • GATK BaseRecalibrator tool that generates a recalibration table based on various covariates for input mapped read data.

  • GATK ApplyBQSR tool that recalibrates the base quality scores of an input BAM or CRAM file containing reads.

  • GATK HaplotypeCaller tool for calling germline SNPs and indels from input BAM file(s) via local re-assembly of haplotypes.

  • GATK VariantFiltration tool used for filtering variants in a VCF file based on INFO and/or FORMAT annotations.

  • GATK MergeVcfs, used for combining multiple variant files.

Read More
Divya Sain
Release notes

Cancer Data Service Explorer for CDS data now available through the CGC's visual interface

Cancer Data Service Explorer is an integrated dataset file explorer on the CGC that allows you to filter and select the exact data that you want to analyze further, and then perform seamless import into a controlled project on the CGC. The explorer currently works with data available through CDS and is accessed by clicking Data > Cancer Data Service Explorer on the main menu bar, while on the CGC's main dashboard. Learn more about how to search and import CDS data to the CGC.

Read More
Divya Sain