Release notes

Recently published apps

We have just published HTSeq-count (2.0.2 in CWL 1.2). HTSeq-count is a Python tool for counting how many reads map to each feature. It takes aligned reads together with a list of genomic features as inputs, and outputs a TSV table with counts for each genomic feature.

Read More
Divya Sain
Release notes

We have just published five tools from the GraphicsMagick 1.3.38 toolkit, the swiss army knife of image processing:

  • GraphicsMagick compare compares two images using statistics and/or visual differencing. The tool compares two images and reports difference statistics according to specified metrics, and/or outputs an image with a visual representation of the differences.

  • GraphicsMagick composite composites (combines) images to create a new image.

  • GraphicsMagick conjure interprets and executes scripts in the Magick Scripting Language (MSL). The Magick scripting language (MSL) will primarily benefit those that want to accomplish custom image processing tasks but do not wish to program.

  • GraphicsMagick convert is used to convert an input image file using one image format to an output file with the same or different image format while applying an arbitrary number of image transformations.

  • GraphicsMagick montage creates a composite image by combining several separate images.

Read More
Divya Sain
Release notes

Recently published apps

We have just published two Bowtie2 2.4.5 (CWL 1.2) tools:

  • Bowtie2 Indexer, for building a Bowtie index from a set of DNA sequences.

  • Bowtie2 Aligner, for performing end-to-end read alignment.

On top of that, there are two more additions to our Public Apps gallery:

  • RSeQC - Junction Saturation 5.0.1 (CWL 1.2) tool for determining if the sequencing depth is sufficient to perform alternative splicing.

  • GATK IndexFeature 4.2.5.0 tool.

Read More
Divya Sain
Release notes

Recently published apps

We have just published the following three tools:

  • SPAdes 3.15.5 - an assembly tool containing various assembly pipelines. SPAdes can be used for reads produced by different sequencing technologies, such as: Illumina, IonTorrent, PacBio, Oxford Nanopore and Sanger. SPAdes was tested on small genomes (eg. bacterial, fungal) and is not intended for larger ones.

  • Unicycler 0.5.0 - a tool for bacterial genome assembly. It can assemble Illumina-sequenced reads, as well as PacBio or Nanopore long-read-only sets (for the best assemblies, it can conduct a hybrid assembly by taking both Illumina and long reads).

  • Quast 5.2.0 - a tool for genome assembly evaluation. QUAST implements different methods for analyzing assemblies. By default, it utilizes Minimap2 for alignment. GeneMarkS, GeneMark-ES, Glimmer, Barrnap and BUSCO are used for gene prediction, while finding structural variations is done by BWA, Sambamba, and GRIDSS. Additionally, QUAST uses bedtools for calculating read coverage, which is presented in the Icarus contig alignment viewer.

Read More
Divya Sain
Release notes

Recently published apps

We have published six tools from the BEDTools 2.30.0 toolkit:

  • BEDTools Coverage - returns the depth and breadth of coverage of features from B on the intervals in A.

  • BEDTools Genomecov - computes histograms of feature coverage for a given genome.

  • BEDTools GetFasta - extracts sequences from a FASTA file for each of the intervals defined in a BED/GFF/VCF file.

  • BEDTools Intersect - screens for overlaps between two sets of genomic features.

  • BEDTools Merge - combines overlapping or “book-ended” features in an interval file into a single feature.

  • BEDTools Sort - sorts a feature file by chromosome and other criteria.

We have also published the Functional Equivalence Evaluation workflow for comparing the functional equivalence of different WGS/WES processing analyses. Functional Equivalence Evaluation workflow is used to establish if the results can be used together (compared, merged) in downstream analyses (common scenario with large, multi-center sequencing studies where different institutions use their own analysis protocols) or considered equally valid for drawing conclusions.

Read More
Divya Sain
Release notes

Recently published apps

We have just published the Nextstrain 2.4.0 toolkit:

  • Nextclade dataset list tool which is used for listing available Nextclade datasets.

  • Nextclade dataset get tool which is used for downloading Nextclade datasets.

  • Nextclade run tool which is used for alignment, mutation calling, clade assignment, quality checks and phylogenetic placement of viral sequences.

  • Nextalign run tool which is used for viral genome alignment and translation.

Read More
Divya Sain
Release notes

DRS export now available on the CGC

In order to further improve interoperability and allow our users to move their data in a seamless way across platforms, we have introduced the DRS export option on the CGC. With the new functionality, users can generate links to platform files (DRS URIs) and metadata into a manifest file, which can then be used for importing the files and metadata on other platforms. Learn how to generate a DRS manifest file on the CGC.

Recently published apps

We have published the Bracken 2.7 toolkit:

  • Bracken (Bayesian Reestimation of Abundance with KrakEN) tool is used for abundance estimation at the species level, the genus level, or above.

  • Bracken Build is used to prepare the reference database for Bracken.

In addition, Metagenomics Profiling - Kraken2 workflow has been published on the CGC. It is used for metagenomic classification, abundance estimation, and visualization.

Read More
Divya Sain
Release notes

Recently published apps

We have recently published the following apps:

  • SBG Single-Cell RNA Deep Learning - Training, a single-cell classifier pipeline for human data. It relies on the transfer learning approach, which uses pre-trained gene embeddings as the starting point for building a model adjusted to given single-cell datasets.

  • SBG Single-Cell RNA Deep Learning - Predict, a single-cell classifier pipeline for human data. This app uses the deep learning model generated by the SBG Single-Cell RNA Deep Learning - Training workflow to classify the input dataset.

Read More
Divya Sain
Release notes

Recently published apps

We have published the CNVkit 0.9.9 toolkit for inferring and visualizing copy number from high-throughput DNA sequencing data. The toolkit includes the following tools:

  • CNVkit breaks lists the targeted genes in which a segmentation breakpoint occurs.

  • CNVkit access calculates the sequence-accessible coordinates in chromosomes from the given reference genome.

  • CNVkit diagram draws copy number or segments on chromosomes as an ideogram.

  • CNVkit export bed converts segments to a BED file.

  • CNVkit export vcf converts segments to a VCF file.

  • CNVkit segmetrics calculates summary statistics of individual segments.

Read More
Divya Sain
Release notes

Recently published apps

We have published the following apps:

  • SBG Pair FASTQs by Metadata CWL1.2 tool, which accepts a list of FASTQ files and groups them into sub-lists based on the metadata. The sbg:draft-2 version of this tool will also remain available in the Public Apps gallery.

  • Upgraded version of the MultiQC (v1.13, CWL1.2) tool, which aggregates results from bioinformatics analyses across many samples into a single report. This wrapper version of MultiQC can also accept inputs from files that were produced by the Salmon Workflow (salmon_quant_archive.tar).

Read More
Divya Sain
Release notes

Prepare data for submission to dbGaP in a breeze

dbGaP Submission Form Suite RShiny app streamlines the way a CGC user can submit their data to dbGaP. The app allows you to import, map and prepare your data for import into the database of Genotypes and Phenotypes (dbGaP). You can easily export the metadata from a project into this tool and transform it in the visual interface to match the dbGaP submission guidelines. Once completed, all you have to do is download the produced Excel file and submit it to dbGaP.

The app is available under Interactive Browsers > Custom interactive apps on the CGC. Learn how to use it from the documentation.

Read More
Divya Sain
Release notes

Interactive Web App Gallery is now live

The Interactive Web Apps page is now available on the CGC. The new page contains all R Shiny apps that we publish and makes them more prominent and accessible in the CGC interface.

Interactive Web Apps are available under Public Apps > Interactive Web Apps on the top navigation bar. With this update, the Public Apps menu item on the CGC has changed from a tab to a dropdown menu which now contains the Workflows and Tools page, where the previous Public Apps page content is located.

OmicCircos plot generation app now available on the CGC

OmicCircos is now available as a custom interactive app on the CGC. The OmicCircos app is an R Shiny application created around the OmicCircos R package for more effective generation of high-quality circular plots for visualizing omics data. Its integration with the Cancer Genomics Cloud (CGC) makes it easy to launch the app from inside the CGC and visualize data that is already present in any of your CGC projects.

The OmicCircos R package that the interactive CGC app is based on was developed by Ying Hu, Chunhua Yan and Xiapeng Bian as a part of Daoud Meerzaman's Computational Genomics and Bioinformatics group at CBIIT/NCI, and it can also be installed via Bioconductor. The data can be gene or chromosome position-based values from mutation, copy number, expression, and methylation analyses.

Find out more about using OmicCircos on the CGC and the OmicCircos R package.

Read More
Divya Sain
Release notes

Seven Bridges CLI now available for macOS ARM users

In order to make sure all of our users have uniform experience and can make the most out of the CGC, the Seven Bridges Command Line Interface (SB CLI) is now also available for macOS ARM (M1/M2) users. The new build allows the growing population of users using Apple computers with M1 and M2 chips to install the SB CLI and interact with the CGC from the command line. The macOS ARM version of the SB CLI is available for download from the Data Tools section and from the related documentation page.

Read More
Divya Sain
Release notes

Disabled accounts can now be reactivated in a snap

Accounts that are locked or disabled due to inactivity can now be automatically reactivated by their owners. A new, streamlined flow allows you to initialize the process by sending a reactivation email to your email address after logging in with your last used credentials. By clicking the link in the email and setting a new password on the CGC, you will have unrestricted access to your account and data again.

Recently published apps

We have just published and updated our public apps gallery with:

  • GATK VariantEval BETA 4.2.5.0, a tool which is used for evaluating variant calls.

  • GATK FilterMutectCalls 4.2.5.0, a tool which is used to filter somatic SNVs and indels called by Mutect2.

  • Picard CreateSequenceDictionary 2.25.7, a tool for creating a DICT index file for a sequence.

  • WARP ExomeGermlineSingleSample 2.4.4, a pipeline for data pre-processing and variant calling in human WES data.

Read More
Divya Sain
Release notes

DRS import from manifest file now available on the CGC

Expanding on the current feature that enables you to import DRS files by entering DRS URIs, we have now enabled DRS file import using manifest files containing all relevant information to import the files, along with associated metadata. This provides an easy and streamlined way to import a large number of DRS files from different sources such as Seven Bridges academic platforms or external sources.

Recently published apps

We have just published and updated our Public Apps gallery with the BCFtools 1.15.1 toolkit - CWL1.2, containing the following tools:

  • BCFtools Annotate - edits VCF files, adds or removes annotations.

  • BCFtools Call - calls SNPs/indels (former “view”).

  • BCFtools Cnv - calls Copy Number Variations.

  • BCFtools Concat - concatenates VCF/BCF files from the same set of samples.

  • BCFtools Consensus - creates consensus sequence by applying VCF variants.

  • BCFtools Convert - converts VCF/BCF to other formats and back.

Read More
Divya Sain
Release notes

Recently published apps

We have just published and updated our Public Apps gallery with Regenie 3.1.3, a tool which is used for whole genome regression analysis.

Read More
Divya Sain
Release notes

Recently published apps

We have published the following apps in our Public Apps gallery:

  • Mosdepth 0.3.3 toolkit: Mosdepth, a tool used for fast depth calculation on WGS, WES or targeted BAM and CRAM files and Mosdepth plot_dist which plots Mosdepth results.

  • Personal Cancer Genome Reporter 1.0.3 which is used for functional annotation and classification of somatic variants.

  • Cancer Predisposition Sequencing Reporter 1.0.0 which analyzes cancer-predisposing germline variants.

We have also updated versions and published tools from the following two toolkits: SRA (v3.0.0, CWL1.2) and Salmon (v1.5.2, CWL1.2). Tools that got the update are:

  • SRA sam-dump that converts SRA data into SAM format. With aligned data, NCBI uses Compression by Reference, which only stores the differences in base pairs between sequence data and the segment it aligns to. The process to restore original data, for example as FASTQ, requires fast access to the reference sequences that the original data was aligned to.

  • SRA fasterq-dump that converts SRA data into FASTQ format while using temporary files and multi-threading to speed up the extraction.

Read More
Divya Sain
Release notes

Recently published apps

We have published three VarDict (v1.8.3, CWL1.2) tools and one workflow:

  • VarDictJava is the VarDict variant caller Java port. It can be used to call SNPs, MNVs, small indels or complex variants from DNA or RNA alignments. VarDictJava can be used for amplicon-based variant calling and supports both single sample and paired sample analysis.

  • VarDict var2vcf_valid, a CWL tool that takes VarDict variants tabular file and outputs variants in VCF format.

  • VarDict var2vcf_paired, a CWL tool that converts VarDict tabular output to VCF.

  • VarDict Variant Calling workflow (also VarDict v1.8.3, CWL1.2), which can be used for single sample and paired sample variant calling using VarDictJava starting from WES, WGS or amplicon data.

We have also published the following workflows and a toolkit:

  • CNVnator Analysis workflow 0.4.1 for CNV calling by doing read-depth (RD) analysis of input BAM files.

  • CNVpytor workflow 1.1 for CNV/CNA detection and analysis based on read depth and allele imbalance in WGS.

Read More
Divya Sain
Release notes

Data Cruncher and Interactive Analysis become Data Studio and Interactive Browsers

Data Studio, previously Data Cruncher, is an interactive analysis tool which allows you to explore and visualize data using environments like JupyterLab and RStudio. Previously located under the Interactive Analysis tab, it has now been given a more prominent location in the project navigation by having its own tab located next to Tasks. With the removal of Data Studio from the Interactive Analysis tab, the tab's name has been changed to Interactive Browsers in order to better reflect its contents.

Recently published apps

We have just published an updated version (4.2.5.0) of Mutect2 workflows:

  • GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0, a workflow used for somatic short variant calling. It runs on a single tumor-normal pair or on a single tumor sample, and performs additional filtering and functional annotation tasks, and

  • GATK Create Mutect2 Panel of Normals 4.2.5.0 that creates a panel of normals for use in other GATK workflows. The workflow takes multiple normal sample callsets and passes them to GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0 with tumor-only mode (although it is called tumor-only, normal samples are given as the input) and additionally collates sites present in two or more samples into a sites-only VCF.

  • Three apps from the MetaXcan toolkit:

    • S-PrediXcan for computing associations between omic features and a complex trait starting from GWAS summary statistics.

    • S-MultiXcan for computing association from predicted gene expression to a trait, using multiple studies for each gene.

    • MetaMany for serially performing multiple MetaXcan runs on a GWAS study from summary statistics using multiple tissues.

  • The MetaXcan Workflow for computing associations between omic features and complex traits across multiple tissues. The workflow includes two tools from MetaXcan framework - MetaMany and S-MultiXcan and it uses summary statistics from a GWAS study and multiple models that predict the expression or splicing quantification.

Read More
Divya Sain
Release notes

Recently published apps

We have just published the V-pipe 2.99.2 for SARS-CoV-2 workflow for analyzing high throughput SARS-CoV-2 sequencing data. V-pipe integrates several tools for the analysis of viral high throughput sequencing data. It allows for assessing viral diversity at the level of SNVs, short variant sequences (or local haplotypes), and long-range haplotypes (or global haplotypes).

Read More
Divya Sain