Release notes
Data Cruncher and Interactive Analysis become Data Studio and Interactive Browsers
Data Studio, previously Data Cruncher, is an interactive analysis tool which allows you to explore and visualize data using environments like JupyterLab and RStudio. Previously located under the Interactive Analysis tab, it has now been given a more prominent location in the project navigation by having its own tab located next to Tasks. With the removal of Data Studio from the Interactive Analysis tab, the tab's name has been changed to Interactive Browsers in order to better reflect its contents.
Recently published apps
We have just published an updated version (4.2.5.0) of Mutect2 workflows:
GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0, a workflow used for somatic short variant calling. It runs on a single tumor-normal pair or on a single tumor sample, and performs additional filtering and functional annotation tasks, and
GATK Create Mutect2 Panel of Normals 4.2.5.0 that creates a panel of normals for use in other GATK workflows. The workflow takes multiple normal sample callsets and passes them to GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0 with tumor-only mode (although it is called tumor-only, normal samples are given as the input) and additionally collates sites present in two or more samples into a sites-only VCF.
Three apps from the MetaXcan toolkit:
S-PrediXcan for computing associations between omic features and a complex trait starting from GWAS summary statistics.
S-MultiXcan for computing association from predicted gene expression to a trait, using multiple studies for each gene.
MetaMany for serially performing multiple MetaXcan runs on a GWAS study from summary statistics using multiple tissues.
The MetaXcan Workflow for computing associations between omic features and complex traits across multiple tissues. The workflow includes two tools from MetaXcan framework - MetaMany and S-MultiXcan and it uses summary statistics from a GWAS study and multiple models that predict the expression or splicing quantification.
Release notes
Recently published apps
We have just published the V-pipe 2.99.2 for SARS-CoV-2 workflow for analyzing high throughput SARS-CoV-2 sequencing data. V-pipe integrates several tools for the analysis of viral high throughput sequencing data. It allows for assessing viral diversity at the level of SNVs, short variant sequences (or local haplotypes), and long-range haplotypes (or global haplotypes).
Release notes
Recently published apps
We have just published the updated 0.7.17 version of BWA MEM Bundle, a well-known tool designed for aligning sequence reads onto a large reference genome, and BWA INDEX, used for indexing the reference sequence as a prior step required for BWA MEM Bundle. Both tools are published in CWL1.2.
Release notes
Recently published apps
We have published the following apps in our Public Apps gallery:
Cyrius (v1.1.1, CWL1.2), a tool that genotypes CYP2D6 in WGS data. It takes WGS BAM or CRAM files and creates a TSV report with CYP2D6 alleles.
Two PharmCAT (v1.6.0, CWL1.2) tools:
PharmCAT VCF Preprocess is a tool that takes a VCF file and prepares it for downstream processing with PharmCAT, and
PharmCAT, a tool for interpreting guideline variants in VCF files.
Two Biobambam2 (v2.0.183, CWL1.2) tools:
Biobambam2 Bamtofastq that converts BAM/CRAM/SAM files to FASTQ format, and
Biobambam2 Bamseqchksum - tool for calculating hashes for the contents of the provided alignments file.
Release notes
Recently published apps
We have just published the following apps:
An updated version of the SRA Download and Set Metadata workflow (SRA Toolkit 3.0.0) that downloads metadata associated with SRA accession via SRA Run Info CGI, (on-demand instance) FASTQ files and sets corresponding metadata.
OptiType (v1.3.5, CWL1.2), a tool designed for precision HLA typing from next-generation sequencing data. It is based on the assumption that the correct HLA genotype explains the highest number of mapped reads.
fastENLOC (v1.0, CWL1.2), a tool that enables integrative genetic association analysis of molecular QTL data and GWAS data.
Release notes
Recently published apps
We have just published the following apps in our Public Apps gallery:
TwoSampleMR, a tool that performs Mendelian randomization testing for a given exposure-outcome pair. It is a wrapper around the TwoSampleMR R package and uses summary statistics data for making causal inference.
CCS, a tool that combines multiple subreads of the same SMRTbell molecule and outputs one highly accurate consensus sequence.
lima, a tool used with PacBio single-molecule sequencing data for barcode and primer sequences identification.
PacBio Flowcell Data Processing, a workflow that can be used to process PacBio CCS or CLR data in preparation for variant calling.
PacBio CCS or CLR WGS Variant Calling workflow that can be used to call structural variants in PacBio CCS or CLR data. The workflow can also call small variants in CCS data using Clair3.
Release notes
Recently published apps
We’ve just published AnnotationDbi select and mapIds, a tool that maps one type of IDs to another. It is based on Bioconductor annotation data packages.
Release notes
Recently published apps
New apps have been added to the CGC:
Two tools from the Samplot toolkit:
Samplot Plot takes alignment files and coordinates for a region containing the SV call of interest (Chromosome, Start position, and End position) and creates a plot of the SV region.
Samplot Vcf can be used to create visualizations of structural variant calls from a VCF file.
Seven tools from the Smoove toolkit:
Smoove Annotate annotates SV calls with SV quality and gene information from GFF3 files.
Smoove Call calls structural variants with Lumpy and optionally calls svtyper.
Smoove Duphold annotates SV calls in the file based on information from the provided alignment files.
Smoove Genotype runs svtyper in parallel on provided SV inputs.
Smoove Merge merges SV calls from individual files with SV calls and sorts them using svtools.
Smoove Paste squares matching SV calls from individual files to a single joint file with final calls.
Smoove Plot-counts takes a VCF file created by other Smoove tools and plots counts of split and discordant reads before and after filtering.
Upgraded four Sambamba tools to 0.8.1 (and CWL 1.2) and added three new tools:
Sambamba Flagstat generates statistics from read flags in a BAM file.
Sambamba Index creates a BAI or FAI index for the provided input.
Sambamba Markdup can be used to mark or remove duplicate reads from an input BAM file.
Sambamba Merge merges alignments in BAM format.
Sambamba Slice can be used to copy a slice (region) of the coordinate sorted and indexed input file in BAM or FASTA format.
Sambamba Sort sorts alignments in BAM format.
Sambamba View accepts alignments in BAM or SAM format and outputs data in a user-specified format.
Release notes
GDC Datasets version update
As of March 11, 2022, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 31.
Recently published apps
We have added four apps to our public apps gallery:
Single cell RNA-seq velocity analysis with scVelo 0.2.4 workflow that performs preprocessing, marker gene analysis, and velocity analysis of single-cell expression data. It is based on SingleCellExperiment, Seurat, scran, scater, AnnotationHub, scuttle, and scVelo.
Velocyto.py - Velocyto 0.17.17 is a package for the analysis of expression dynamics in single cell RNAseq data. In particular, it enables estimations of RNA velocities of single cells by distinguishing unspliced and spliced mRNAs in standard single-cell RNA sequencing protocols. Velocyto.py is a command line tool (distributed with the package) that is used to generate spliced/unspliced count matrices.
SBG single cell object convertor tool that performs conversion of single cell data object type for commonly used formats: Seurat, AnnotatedData, and SingleCellExperiment.
Single cell RNA-seq trajectory analysis with slingshot and tradeSeq, a tool that performs single cell trajectory analysis with slingshot 2.0.0, and differential expression testing on inferred trajectories with tradeSeq 1.6.0. Slingshot takes advantage of single cell data principal components analysis (PCA) and clustering to infer probable paths of cell development.
Release notes
Support for Nextflow and WDL workflows available on the CGC
Apart from significant contributions from Seven Bridges team members to the development of the Common Workflow Language (CWL) and its extensive implementation on the CGC, we are now taking a step further and providing support for two more workflow description languages, Nextflow and WDL. This presents a groundbreaking move in the direction of enabling you to reduce the time needed to bring your apps to the CGC, eliminate the need for conversion of your Nextflow or WDL code, while still allowing you to use a better interface for running workflows and all other out-of-the-box features in the Seven Bridges ecosystem.
CDS data import updates
The latest update of the CDS data import functionality on the CGC removes the limitation of having to use a controlled data project as the target project for CDS data import. The use of controlled data projects is still required for successful importing of controlled data from the CDS, but open access CDS data can now be freely imported in open data projects on the CGC.
Release notes
AWS i3 instances available on all environments
With this update you can use the newest Amazon EC2 I3 instances designed for data-intensive, high transaction, low latency workloads, offering the best price per I/O performance (I3) and the lowest price per GB of SSD instance storage on Amazon EC2 (I3en).
Recently published apps
We have published GATK RNAseq short variant discovery 4.2.0.0 workflow, which represents a CWL implementation of the official GATK best practices workflow given in WDL for RNASeq variant discovery. Starting from an unmapped BAM file, the workflow performs alignment to the reference genome, followed by marking of duplicates, reassigning of mapping qualities, base recalibration, variant calling, and variant filtering.
Release notes
Recently published apps
We have published 10 tools from the GRIDSS module software suite (toolkit) containing tools useful for the detection of genomic rearrangements:
GRIDSS tool, a structural variation caller for Illumina sequencing data. It calls variants based on alignment-guided positional de Bruijn graph genome-wide break-end assembly, split read, and read pair evidence.
GRIDSS Extract Overlapping Fragments is used to extract reads of interest for targeted GRIDSS variant calling.
GRIDSS Annotate VCF Kraken2 adds Kraken2 classifications to single breakend and breakpoint inserted sequences.
GRIDSS Annotate VCF RepeatMasker adds RepeatMasker classifications to inserted sequences.
GRIDSS GeneratePonBedpe aggregates variants from multiple VCFs and counts the number of samples supporting each.
Release notes
Recently published apps
We’ve just published OlinkAnalyze DE, a tool that performs differential expression analysis on Olink Normalized Protein eXpression (NPX) data, and OlinkAnalyze QC that generates a quality control and exploratory analysis report on Olink NPX data.
Release notes
SBFS support for macFUSE 4.x
SBFS is a command-line tool which enables interaction with CGC project files that are mounted as a local file system. In order to use SBFS, it is necessary to have the FUSE component installed. While FUSE is a part of the Linux kernel, on macOS it is necessary to install FUSE for macOS (which is now called macFUSE) and we are now adding support for macFUSE version 4.x (macFUSE 4.0.0 was released in October 2020, and that is when the name was changed from “FUSE for macOS” to “macFUSE”, while its latest version is macFUSE 4.2.4). Please note that SBFS is available as a BETA tool. Also, it’s not available for the Windows operating system, but only for Linux and macOS.
Release notes
Data Cruncher default environment update
The default environment for Data Cruncher interactive analyses has been updated to include more up-to-date versions of Python (upgraded to 3.9) and R (upgraded to 4.1).
GDC Datasets version update
As of December 17, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 30.0.
Release notes
Recently published apps
GRIDSS/PURPLE/LINX Workflow, used for somatic genomic rearrangement detection and classification on WGS data. This workflow takes a pair of matched tumor/normal BAM files and produces allele-specific copy number of every base of the genome, overall sample purity and ploidy, annotated SV clusters and gene fusion predictions. Moreover, it outputs detailed visualisations of the rearrangements in the tumor genome via integrated Circos plots showing copy number changes, clustered SVs, derivative chromosome predictions and impacted genes.
PURPLE CNV Calling Workflow, used for somatic CNV calling and purity and ploidy estimation on WGS data. It is based on PURPLE 2.51, and consists of two additional tools - AMBER and COBALT. The workflow first calculates B-allele frequency (BAF) with AMBER and read depth ratios with COBALT, which is then used by PURPLE to estimate the purity, ploidy and copy number profile of a tumor sample.
Release notes
Metadata editing using manifest files just got easier
CGC provides the capability to modify metadata for multiple files in a project by using the Export metadata manifest and Edit metadata with manifest options in the File Browser. This release brings some major improvements to this feature:
Support for different manifest file formats. Besides CSV, we have added support for the TSV file format.
Use either file name or ID to identify a file. Files whose metadata is being edited can be specified using only file ID or file name (along with path) in the manifest file used with the Edit metadata with manifest option.
Support for folders. The name column can contain file path within the project (along with the file name) if the file is in a folder instead of the project root.
Click Read more below to see the full list of improvements.
Recently published apps
We have just published and upgraded versions (from 2.17 to 2.22) of minimap2, a sequence alignment program that aligns DNA or mRNA sequences against a reference database, and minimap2 build index, a reference indexer for minimap2 aligner.
This week’s publishing streak also includes METAL, a tool for meta-analysis genome-wide association scans. METAL can combine either (a) test statistics and standard errors or (b) p-values across studies (taking sample size and direction of effect into account). A METAL analysis is a convenient alternative to a direct analysis of merged data from multiple studies.
Release notes
Recently published apps
We have just published Picard FastqToSam, a tool that converts FASTQ files to an unaligned SAM or BAM file, and a set of seven Delly tools:
Delly CNV for calling copy-number variants
Delly Call, a structural variants caller
Delly LR, a structural variants caller for long reads data
Delly Sansa Annotate for annotating structural variants
Delly Classify for classifying somatic or germline copy-number variants
Delly Filter, a tool that filters structural variants
Delly Merge for merging of structural variants in BCF format
Release notes
Recently published apps
We have just published the following apps:
CrossMap, a tool that converts genomic coordinates between different assemblies, and CrossMap Viewchain that prints the chain file for two assemblies in a human-readable format.
VerifyBamID2 that estimates contamination of DNA samples from read data, accounting for ancestry information.
Release notes
Recently published apps
We have just published DRAGMAP, the open source DRAGEN mapper/aligner that can be used to align single or paired-end reads (FASTQ) or an input BAM file. The app is available in the Public Apps gallery.