Release notes

Recently published apps 

We’ve published the following new apps on the CGC:

  • FusionInspector (v2.8.0), a tool that performs validation of fusion transcript predictions. FusionInspector is a part of the Trinity Cancer Transcriptome Analysis Toolkit (CTAT). It takes a list of potential fusion genes (obtained by executing any fusion transcript prediction tool), extracts the genomic regions corresponding to the fusion partners, and creates mini-fusion-contigs that hold the gene pairs in the suggested fused orientation. The original reads align to these putative fusion contigs. In the fusion-gene context, fusion-supporting reads that would typically align as split reads or discordant pairs should align as concordant ‘normal’ reads. Reads that span fragments and reads containing fusion breakpoints that support each fusion, are recognized, reported, and scored accordingly. 

  • Arriba (v2.4.0), a tool for the detection of gene fusions from RNA-Seq data. Arriba is designed to work with STAR aligner-processed data, and the post-alignment runtime is typically a few minutes long. Arriba does not require reducing the --alignIntronMax parameter of STAR to identify fusions resulting from focal deletions, in contrast to many other fusion detection methods that are based on STAR. Its intended application was in the context of clinical research. As such, high sensitivity and fast runtimes were crucial design requirements. Arriba can identify structural rearrangements other than gene fusions that may have clinical significance. These include viral integration sites, internal tandem duplications, whole exon duplications, and truncations of genes (i.e., breakpoints in introns and intergenic regions). 

Recently updated apps 

 

We also updated the following apps: 

  • DESeq2 tool (v1.40.1) that performs differential gene expression analysis across two or more study conditions. DESeq2 performs differential gene expression analysis using negative binomial generalized linear models. It analyzes estimated read counts from several samples, each belonging to one of two or more conditions under study, searching for systematic changes between conditions, as compared to within-condition variability. 

Read More
Marko Marinkovic
Release notes

Cloud Cost Estimator now available on the CGC

We have added the capability to predict and understand the cost of analyses before running them, for a set of selected apps available in the Public Apps gallery. The estimation is based on the following parameters:

  • Use of spot instances. Prices of using spot and on-demand instances differ and affect the final task price.

  • Total input file size. The size of input files affects task running time, which impacts the total task cost.

  • Type of instance that is used to run the task. The cost of running different types of instances depends on their type and resources (available compute power, memory, etc.). Note that estimations are available only for instances with default resource configuration, as defined by cloud providers, and won't be available if the default resource values are changed. See available Amazon Web Services and Google Cloud Platform instances on the CGC.

Please note that the estimated costs are AN ESTIMATE ONLY AND NOT A COST GUARANTEE. The costs shown in the estimates are only an approximation. Final costs may change after all the task elements have been accounted for.

New public project on the CGC  

To further increase the versatility and usability of available analyses, we also published the  Vitessce Demo Notebook as a part of the Integrative Single-cell Data Visualization with Vitessce: User Guide public project on the CGC. This project serves as a comprehensive tutorial for users interested in leveraging Vitessce for the visualization and analysis of single-cell data. It features one Data Studio interactive analysis, written in Python, with step-by-step demonstrations and examples showcasing the integrative capabilities of Vitessce Python API. 

Recently published apps

We have also published new workflows for processing Nanopore data:

  • ONT Flowcell Processing - aligns (Minimap2), sorts (Samtools) and quality checks (NanoPlot, Samtools Flagstat, Mosdepth, GATK ComputeLongReadMetrics) input Nanopore data from a single flowcell. 

  • ONT WGS Variant Calling - merges (Sambamba), calls variants (Clair3, Sniffles2) and quality checks (Mosdepth, NanoPlot) input BAM files from Nanopore data. 

Read More
Marko Marinkovic
Release notes

Recently updated apps

We have updated the following tools on the CGC: 

  • Exomiser 13.3.0 – used to identify candidate causative variants from WES or WGS patient VCF data and phenotype HPO terms. 

  • PharmCAT 2.8.3 toolkit: 

    • PharmCAT VCF Preprocess – prepares an input VCF file for PharmCAT. 

    • PharmCAT – takes a single-sample VCF file and returns a report with guideline variants. 

  • Sambamba 1.0.1 toolkit: 

    • Sambamba Index – creates a BAI or FAI index for the provided BAM/FASTA file. 

    • Sambamba Slice – copies a slice (region) of the coordinate sorted and indexed input file in BAM or FASTA format. 

    • Sambamba Sort – sorts alignments in BAM format. 

    • Sambamba Markdup – marks or removes duplicate reads from an input BAM file. 

    • Sambamba Flagstat – creates read flag statistics from a BAM file. 

    • Sambamba Merge – merges alignments in BAM format. 

    • Sambamba View – inspects and filters alignments in SAM/BAM format. 

  • Clair3 1.0.4 – calls small germline variants from data generated by Nanopore, PacBio or Illumina sequencing technologies. 

Read More
Marko Marinkovic
Release notes

Azure now available on the CGC

To help you reduce costs and run analyses more effectively and efficiently in compute locations that are closer to your data, we have introduced a new Azure region on Cancer Genomics Cloud (CGC). The Azure South Central US region can now be selected as the project location when creating a new project on the CGC, meaning that data in such projects will be stored and processed using Azure cloud capacities. On top of that, we have also added support for attaching Azure storage buckets as volumes to these two platforms.

Single and Global logout flows defined by SAML protocol are now available for SSO

Users who access the CGC through Single Sign-On (SSO) can now perform Singe (IdP Initiated) logout to log out of multiple SSO sessions, in a single click. Also, it is now possible to initiate Global (SP initiated) logout flow from the CGC.

Recently published apps 

We have published the following tools in our Public Apps gallery: 

  • Tximport, a tool that imports and summarizes transcript-level estimates for transcript and gene-level analysis based on the tximport R/Bioconductor package. It is designed to simplify the import of transcript-level abundances, estimated counts, and effective lengths from a variety of upstream tools, for downstream transcript-level or gene-level analysis. 

  • Three tools from the SplAdder (3.0.4) toolkit: 

  • Five tools from the Qualimap 2.3 toolkit: 

  • Six tools from the RSeQC 5.0.1 toolkit: 

  • The Tidyproteomics 1.5.2 toolkit

Recently updated apps 

TopHat2, a tool that aligns RNA-Seq reads to a genome to identify exon-exon splice junctions, just got updated to version 2.2.1 and upgraded to CWL version 1.2 (was previously available in CWL draft-2).

Read More
Marko Marinkovic
Release notes

Improved error messages for volume imports

To provide you with more detailed information about each import from an attached volume and enable you to resolve import issues independently, we have added improved notifications in the recently implemented Activity center, available by clicking Open activity center in the Activity feed. When any of the items from a particular import fails, you will be able to see an error message and a corresponding error code for each of the items, allowing you to understand and try to fix the issue. Furthermore, a description and link to the relevant documentation will be provided for each import from a volume.

Recently published apps

The Change-O 1.3.0 toolkit is the latest new toolkit addition in our Public Apps gallery. It includes the following apps: 

  • DefineClones - assigns Ig sequences into clonal groups. 

  • BuildTrees - creates IgPhyML input files. 

  • ParseDb - parses and updates input database files. 

  • AlignRecords - multiple aligns sequence fields. 

  • AssignGenes - assigns V(D)J gene annotations. 

  • MakeDb - creates standardized databases output from the input germline alignment results. 

  • CreateGermlines - reconstructs germline V(D)J sequences for alignment data. 

  • ConvertDb - parses input tab-delimited database files and converts them to different output formats. 

Recently updated apps

We updated Broad Institute’s best practices for somatic copy number variant discovery analyses, to version 4.2.5.0 in our Public Apps gallery: 

  • GATK Somatic CNV Panel Workflow 4.2.5.0 - used for creating a panel of normals (PON) given a set of normal samples. 

  • GATK Somatic CNV Pair Workflow 4.2.5.0 - used for detecting copy number variants (CNVs) from WES/WGS single sample data in tumor-only or matched-normal mode. 

Read More
Marko Marinkovic
Release notes

Recently published apps

The pRESTO 0.7.1. toolkit is the latest new toolkit addition in our Public Apps gallery. It includes the following apps: 

  • ParseLog - Parses pRESTO log records and outputs values in TAB-separated tables. 

  • BuildConsensus - Builds consensus sequences. 

  • ClusterSets - Clusters sequences into groups. 

  • CollapseSeq- Removes duplicates sequences from input FASTA/FASTQ files. 

  • PairSeq - Sorts and matches sequences across input files. 

  • ConvertHeaders - Converts sequence headers to pRESTO format. 

  • AlignSets - Aligns sequences using different methods. 

  • FilterSeq - Filters input sequences. 

  • ParseHeaders - Manipulates sequence headers. 

  • SplitSeq - Splits and samples sequence files. 

  • UnifyHeaders - Reassigns or deletes sequence header fields. 

  • AssemblePairs - Assembles paired-end reads to a single sequence. 

  • MaskPrimers - Removes primers and annotates sequences with primers and barcodes. 

  • EstimateError - Estimates annotation set error rates.  

We also published the following new tools: 

  • ComBat-seq (sva 3.35.2), an R tool used for batch effect adjustment in bulk RNA-seq data. Some additional improvements to the tool wrapper were developed, like removing more than one batch by dataset and adapting outputs to be compatible with downstream analyses (DeSeq). 

  • GffRead (0.12.7) GFF/GTF utility tool providing format conversions, filtering, FASTA sequence extraction, and more. 

Recently updated apps

We published the following updates in our Public Apps gallery: 

  • RNA-seq alignment - STAR (2.7.10a), a workflow that performs the first step of RNA-seq analysis - alignment of the reads to a reference genome. It is used to generate aligned BAM files (in genome and transcriptome coordinates) from RNA-seq data, which can later be used in further RNA studies, like gene expression analysis. 

  • Trim Galore! (0.6.10) is a wrapper around adapter trimming and quality control tools Cutadapt and FastQC with extra functionality for RRBS data.

Read More
Marko Marinkovic
Release notes

Recently published apps

 We published Immcantation toolkit 4.4.0 in our Public Apps gallery. The toolkit consists of a set of pipeline scripts which are wrapped as the following tools: 

  • preprocess-phix - removes reads which align to phiX174 from the input sequence file. 

  • presto-abseq - runs pRESTO tools for pre-processing of NEBNext / ABSeq immune sequencing data. 

  • presto-clontech - uses pRESTO tools for analyzing Takara Bio/Clontech SMARTer v1 immune sequencing kit data. 

  • presto-clontech-umi - uses pRESTO tools for analyzing Takara Bio/Clontech SMARTer v2 (UMI) immune sequencing kit data. 

  • changeo-10x - annotates and infers clonal relationships in Cell Ranger 10x Genomics single-cell V(D)J data. 

  • changeo-igblast - does V(D)J alignment using IgBLAST. 

  • tigger-genotype - does TIgGER polymorphism detection and genotyping. 

  • shazam-threshold - calculates clonal assignment threshold. 

  • changeo-clone - runs Change-O cloning and germline reconstruction. 

We also published Nirvana 3.18.1. Nirvana annotates variants from VCF file input and generates a JSON file with results. 

Read More
Marko Marinkovic
Release notes

Recently published apps

We published the following apps in our Public Apps gallery:

  • RNA-SeQC 2.4.2, a tool that computes post-alignment quality control metrics for RNA-Seq data. It takes aligned reads in BAM/SAM or CRAM format and an annotation file as inputs, and outputs different alignment metrics files

  • scCODA 0.1.9, a Python-based tool that performs differential analysis of cell populations.

Read More
Marko Marinkovic
Release notes

Recently published apps

We have just published the following tools from the BBTools 39.01 toolkit:

  • BBDuk: used for trimming, filtering, and masking of input reads.

  • Reformat: used for generic read-processing tasks (changing ASCII quality encoding, interleaving, file format, compression).

  • BBMap: used for splice-aware read alignment.

  • Dedupe: used for removing duplicates from input sequences.

  • SplitNextera: used for splitting Nextera long-mate-pair reads.

  • CalcUniqueness: used for determining library complexity and the need for additional sequencing by generating kmer uniqueness histogram.

  • Taxonomy: used for printing taxonomy information for provided organism identifiers.

  • Repair: used to correct disordered reads and reads whose mates have been lost.

  • Seal: used for alignment-free sequence quantification.

  • BBMerge: used for merging overlapping paired end reads.

  • BBMask: used for masking low-complexity, tandem repeats or SAM mapped regions.

  • Tadpole: used as a kmer-based assembler.

  • Statistics: used for calculating assembly statistics.

  • BBNorm: used for normalizing read depth based on kmer counts.

Read More
Marko Marinkovic
Release notes

DRS notification improvements and the brand new Activity center

To provide you with more detailed information about each DRS import operation and enable you to resolve import issues independently, we have improved DRS-related notifications and implemented the Activity center, available by clicking Open activity center in the Activity feed.

Galaxy and OHIF Viewer now available in Data Studio on the CGC

Cancer Genomics Cloud just became more versatile by offering two new interactive tools as Data Studio environments, Galaxy and OHIF Viewer.

Galaxy is an open-source platform for FAIR data analysis that enables you to use tools from various domains and plug them into workflows through its graphical web interface.

The OHIF Viewer is a medical image viewer provided by the Open Health Imaging Foundation (OHIF). It is a web application designed to load large radiology studies as quickly as possible.

Read More
Marko Marinkovic
Release notes

Recently published apps

We published Giraffe-DeepVariant workflow 1.0, Cramino 0.9.7 and kyber 0.4.0 tools from the NanoPack2 toolkit, as well as Pisces 5.3.0.0 tool, PureCN NormalDB workflow 2.6.4, PureCN workflow 2.6.4, zUMIs 2.9.7 tool, and AlphaFold 2.3.2 tool. Read more for details.

Read More
Divya Sain
Release notes

Recently published apps

We published the following apps in our Public Apps gallery:

  • RADx-rad v0.2 Workflow, which is used for metagenomic data analysis of SARS-CoV-2 from wastewater samples. The workflow was developed and ported to CWL as a part of the RADx (Rapid Acceleration of Diagnostics) - the initiative to speed innovation in the development, commercialization, and implementation of technologies for COVID-19 testing, launched by The US National Institutes of Health (NIH).

  • CNVPanelizer 1.32.0, which generates a report table and visualization of detected CNVs from targeted sequencing data.

  • Control-FREEC 11.6, which can be used for somatic copy number analysis of WGS, WES and targeted data.

Read More
Divya Sain
Release notes

Recently published apps

We have published the following apps in our Public Apps gallery:

  • VEP Slivar Trios Rare Diseases Analysis with VEP 109.3 version and Slivar 0.3.0 version inside. This analysis is used for preprocessing and analyzing variants from related individuals (trios or families; WES or WGS).

  • STAR-Fusion (v1.12.0), an app that uses the STAR aligner to identify candidate fusion transcripts supported by Illumina reads.

  • STAR-Fusion Build FusionFilter Dataset (v1.12.0) that creates the required CTAT genome lib archive for STAR-Fusion execution.

  • Cutadapt (v4.4), an app most commonly used for removing adapter sequences. It finds and removes adapter sequences, primers, poly-A tails and other types of unwanted sequences from high-throughput sequencing reads.

  • Seven tools from the Kalisto 0.48.0 toolkit:

    • kallisto quant computes equivalence classes for reads and quantifies transcript abundances from RNA-Seq data.

    • kallisto quant-tcc runs the EM algorithm on a supplied TCC matrix file to make transcript-level estimates.

    • kallisto bus produces BUS (Barcode-UMI-Set format) output files from single-cell RNA Seq datasets.

    • kallisto merge merges the results of several batches obtained by kallisto pseudo.

    • kallisto h5dump converts HDF-5-formatted results to plaintext.

    • kallisto index builds an index from a transcriptome FASTA formatted file of target sequences.

    • kallisto inspect outputs the target de Bruijn Graph from the kallisto index file in different file formats.

Read More
Divya Sain
Release notes

Recently published apps

We published the following apps in our Public Apps gallery:

  • Parabricks fq2bam (4.0.0-1) - GPU-accelerated alignment, duplicate marking and optionally BQSR.

  • Parabricks haplotypecaller (4.0.0-1) - GPU-accelerated GATK HaplotypeCaller.

  • Parabricks deepvariant (4.0.0-1) - GPU-accelerated version of DeepVariant.

  • Parabricks Somatic Calling workflow - calling somatic variants from a matched tumor-normal sample pair. It is based on running accelerated Mutect2 on GPU instances with or without a panel of normals.

Read More
Divya Sain
Release notes

Recently published apps

We published the NanoMod workflow version 1.1. NanoMod is a workflow for detecting RNA modifications using Oxford Nanopore direct long-read sequencing data.

Read More
Divya Sain
Release notes

Recently published apps

We published the following tools from the GATK 4.4.0.0 and ensembl-vep 109.3 toolkits:

  • GATK BaseRecalibrator, which generates a recalibration table based on various covariates for input mapped reads.

  • GATK ApplyBQSR, which recalibrates the base quality scores of an input BAM or CRAM file containing reads.

  • GATK GatherBQSRReports, which gathers scattered BQSR recalibration reports into a single file.

  • GATK HaplotypeCaller, which calls germline SNPs and indels from input BAM file(s) via local re-assembly of haplotypes.

  • GATK VariantFiltration, which is used for filtering variants in a VCF file based on INFO and/or FORMAT annotations.

  • Augmented Filter VEP, which is a customized wrapper of the filter_vep script from the ensembl-vep toolkit. The tool is modified to allow GNU parallel-scattered filtering of VEP-annotated VCFs split on chromosomes.

  • Variant Effect Predictor, which predicts functional effects of genomic variants and is used to annotate VCF files.

In addition, the VEP annotation workflow 109.3 is also live and available in the Public Apps gallery. It is used for preprocessing, annotating, and filtering VCF files using the vt toolkit and VEP.

We also published the PURPLE CNV Calling Workflow used for somatic CNV calling and purity and ploidy estimation on WGS data. It is based on PURPLE 3.7.2, and consists of two additional tools - AMBER and COBALT. The workflow first calculates B-allele frequency (BAF) with AMBER and read depth ratios with COBALT, which is then used by PURPLE to estimate the purity, ploidy and copy number profile of a tumor sample.

Read More
Divya Sain
Release notes

Recently published apps

We published the following tools from the STAARpipeline 0.9.6 and FAVORannotator 1.0.0 toolkits:

  • STAARpipeline tool, which performs phenotype-genotype association analyses using the STAAR procedure. The app is designed for analyzing whole-genome/whole-exome sequencing data.

  • STAARpipelineSummary VarSet tool, which summarizes results from the STAAR procedure for analyzing WGS and WES data.

  • STAARpipelineSummary IndVar tool, which extracts information of individual variants from a user-specified variant set.

  • FAVORannotator tool which functionally annotates genotype data in GDS format using the FAVOR Database. The resulting file can then facilitate a wide range of functionally-informed downstream analyses, for example, phenotype-genotype association analyses using the STAARpipeline toolkit.

Read More
Divya Sain
Release notes

New Public Projects view

The new Public Projects gallery view is now available on the CGC. The new interface now resembles our Public Apps gallery and provides an overview of the purpose and content of each project from a single page, which should make the projects more accessible and allow you to have a better insight into their usefulness for your specific use cases.

Recently published apps

We have published the following new and updated apps in our Public Apps gallery:

  • ABySS 2.3.5 - a de novo sequence assembler intended for short paired-end reads and genomes of all sizes.

  • Minia 3.2.6 - a short-read assembler based on a de Bruijn graph.

  • IDBA 1.1.3 toolkit:

    • IDBA-Hybrid - a de novo assembler for hybrid sequencing data.

    • IDBA-UD - a short-read-data de novo assembler.

    • fq2fa - used for converting FASTQ format read data to FASTA format suitable for IDBA tools.

  • ABACAS 1.3.1 - used for contiguating reference-based assemblies.

  • Viralrecon Illumina De novo assembly workflow - designed for amplicon and metagenomics short-reads assembly. It is able to analyze metagenomics data obtained from shotgun sequencing (e.g. directly from clinical samples) and enrichment-based library preparation methods (e.g. amplicon-based or probe-capture-based data). It takes single or multiple sample Illumina short-reads, and performs reads trimming, removing host reads, assembly with one of the five included assemblers, blasting and different QC metrics calculating.

Read More
Divya Sain
Release notes

Recently published apps

We have just published the following GATK 4.4.0.0 tools:

  • GATK IndexFeatureFile used for indexing of provided feature files.

  • GATK MergeVcfs - used for combining multiple variant files.

  • GATK VariantEval BETA - used for evaluating variant calls.

  • GATK FilterMutectCalls - used to filter somatic SNVs and indels called by Mutect2.

We have also published Minimac 4 4.1.2, which is a tool for imputing genotypes.

Read More
Divya Sain
Release notes

Recently published apps

Metagenomics WGS analysis - Centrifuge 1.0.4

A workflow for analyzing metagenomic samples. It assigns taxonomic labels to DNA sequences, estimates the abundance of the taxonomic categories in the sample, makes visualizations that give insights into the taxonomic structure of the sample, and makes files that are suitable for downstream analysis. This allows researchers to assign reads from their samples to a likely species of origin and quantify each species’ abundance.

Reference Index Creation - Centrifuge 1.0.4

A workflow that builds an index from reference sequences downloaded from NCBI databases.

Five tools from the Centrifuge 1.0.4 toolkit:

  • Centrifuge Classifier is the main tool of the Centrifuge toolkit, used for classification of metagenomics reads.

  • Centrifuge Download is a part of the Centrifuge toolkit, used for downloading reference sequences from NCBI.

  • Centrifuge Build is a part of the Centrifuge toolkit, which makes a Centrifuge index from DNA sequences.

  • Centrifuge Kreport is used to make a Kraken-style report from the Centrifuge Classifier output.

  • Centrifuge Inspect is a part of the Centrifuge toolkit that inspects index files.

Read More
Divya Sain