Divya Sain Divya Sain

Release notes

Recently published apps

We have recently published the following apps:

  • SBG Single-Cell RNA Deep Learning - Training, a single-cell classifier pipeline for human data. It relies on the transfer learning approach, which uses pre-trained gene embeddings as the starting point for building a model adjusted to given single-cell datasets.

  • SBG Single-Cell RNA Deep Learning - Predict, a single-cell classifier pipeline for human data. This app uses the deep learning model generated by the SBG Single-Cell RNA Deep Learning - Training workflow to classify the input dataset.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have published the CNVkit 0.9.9 toolkit for inferring and visualizing copy number from high-throughput DNA sequencing data. The toolkit includes the following tools:

  • CNVkit breaks lists the targeted genes in which a segmentation breakpoint occurs.

  • CNVkit access calculates the sequence-accessible coordinates in chromosomes from the given reference genome.

  • CNVkit diagram draws copy number or segments on chromosomes as an ideogram.

  • CNVkit export bed converts segments to a BED file.

  • CNVkit export vcf converts segments to a VCF file.

  • CNVkit segmetrics calculates summary statistics of individual segments.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have published the following apps:

  • SBG Pair FASTQs by Metadata CWL1.2 tool, which accepts a list of FASTQ files and groups them into sub-lists based on the metadata. The sbg:draft-2 version of this tool will also remain available in the Public Apps gallery.

  • Upgraded version of the MultiQC (v1.13, CWL1.2) tool, which aggregates results from bioinformatics analyses across many samples into a single report. This wrapper version of MultiQC can also accept inputs from files that were produced by the Salmon Workflow (salmon_quant_archive.tar).

Read More
Divya Sain Divya Sain

Release notes

Prepare data for submission to dbGaP in a breeze

dbGaP Submission Form Suite RShiny app streamlines the way a CGC user can submit their data to dbGaP. The app allows you to import, map and prepare your data for import into the database of Genotypes and Phenotypes (dbGaP). You can easily export the metadata from a project into this tool and transform it in the visual interface to match the dbGaP submission guidelines. Once completed, all you have to do is download the produced Excel file and submit it to dbGaP.

The app is available under Interactive Browsers > Custom interactive apps on the CGC. Learn how to use it from the documentation.

Read More
Divya Sain Divya Sain

Release notes

Interactive Web App Gallery is now live

The Interactive Web Apps page is now available on the CGC. The new page contains all R Shiny apps that we publish and makes them more prominent and accessible in the CGC interface.

Interactive Web Apps are available under Public Apps > Interactive Web Apps on the top navigation bar. With this update, the Public Apps menu item on the CGC has changed from a tab to a dropdown menu which now contains the Workflows and Tools page, where the previous Public Apps page content is located.

OmicCircos plot generation app now available on the CGC

OmicCircos is now available as a custom interactive app on the CGC. The OmicCircos app is an R Shiny application created around the OmicCircos R package for more effective generation of high-quality circular plots for visualizing omics data. Its integration with the Cancer Genomics Cloud (CGC) makes it easy to launch the app from inside the CGC and visualize data that is already present in any of your CGC projects.

The OmicCircos R package that the interactive CGC app is based on was developed by Ying Hu, Chunhua Yan and Xiapeng Bian as a part of Daoud Meerzaman's Computational Genomics and Bioinformatics group at CBIIT/NCI, and it can also be installed via Bioconductor. The data can be gene or chromosome position-based values from mutation, copy number, expression, and methylation analyses.

Find out more about using OmicCircos on the CGC and the OmicCircos R package.

Read More
Divya Sain Divya Sain

Release notes

Seven Bridges CLI now available for macOS ARM users

In order to make sure all of our users have uniform experience and can make the most out of the CGC, the Seven Bridges Command Line Interface (SB CLI) is now also available for macOS ARM (M1/M2) users. The new build allows the growing population of users using Apple computers with M1 and M2 chips to install the SB CLI and interact with the CGC from the command line. The macOS ARM version of the SB CLI is available for download from the Data Tools section and from the related documentation page.

Read More
Divya Sain Divya Sain

Release notes

Disabled accounts can now be reactivated in a snap

Accounts that are locked or disabled due to inactivity can now be automatically reactivated by their owners. A new, streamlined flow allows you to initialize the process by sending a reactivation email to your email address after logging in with your last used credentials. By clicking the link in the email and setting a new password on the CGC, you will have unrestricted access to your account and data again.

Recently published apps

We have just published and updated our public apps gallery with:

  • GATK VariantEval BETA 4.2.5.0, a tool which is used for evaluating variant calls.

  • GATK FilterMutectCalls 4.2.5.0, a tool which is used to filter somatic SNVs and indels called by Mutect2.

  • Picard CreateSequenceDictionary 2.25.7, a tool for creating a DICT index file for a sequence.

  • WARP ExomeGermlineSingleSample 2.4.4, a pipeline for data pre-processing and variant calling in human WES data.

Read More
Divya Sain Divya Sain

Release notes

DRS import from manifest file now available on the CGC

Expanding on the current feature that enables you to import DRS files by entering DRS URIs, we have now enabled DRS file import using manifest files containing all relevant information to import the files, along with associated metadata. This provides an easy and streamlined way to import a large number of DRS files from different sources such as Seven Bridges academic platforms or external sources.

Recently published apps

We have just published and updated our Public Apps gallery with the BCFtools 1.15.1 toolkit - CWL1.2, containing the following tools:

  • BCFtools Annotate - edits VCF files, adds or removes annotations.

  • BCFtools Call - calls SNPs/indels (former “view”).

  • BCFtools Cnv - calls Copy Number Variations.

  • BCFtools Concat - concatenates VCF/BCF files from the same set of samples.

  • BCFtools Consensus - creates consensus sequence by applying VCF variants.

  • BCFtools Convert - converts VCF/BCF to other formats and back.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published and updated our Public Apps gallery with Regenie 3.1.3, a tool which is used for whole genome regression analysis.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have published the following apps in our Public Apps gallery:

  • Mosdepth 0.3.3 toolkit: Mosdepth, a tool used for fast depth calculation on WGS, WES or targeted BAM and CRAM files and Mosdepth plot_dist which plots Mosdepth results.

  • Personal Cancer Genome Reporter 1.0.3 which is used for functional annotation and classification of somatic variants.

  • Cancer Predisposition Sequencing Reporter 1.0.0 which analyzes cancer-predisposing germline variants.

We have also updated versions and published tools from the following two toolkits: SRA (v3.0.0, CWL1.2) and Salmon (v1.5.2, CWL1.2). Tools that got the update are:

  • SRA sam-dump that converts SRA data into SAM format. With aligned data, NCBI uses Compression by Reference, which only stores the differences in base pairs between sequence data and the segment it aligns to. The process to restore original data, for example as FASTQ, requires fast access to the reference sequences that the original data was aligned to.

  • SRA fasterq-dump that converts SRA data into FASTQ format while using temporary files and multi-threading to speed up the extraction.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have published three VarDict (v1.8.3, CWL1.2) tools and one workflow:

  • VarDictJava is the VarDict variant caller Java port. It can be used to call SNPs, MNVs, small indels or complex variants from DNA or RNA alignments. VarDictJava can be used for amplicon-based variant calling and supports both single sample and paired sample analysis.

  • VarDict var2vcf_valid, a CWL tool that takes VarDict variants tabular file and outputs variants in VCF format.

  • VarDict var2vcf_paired, a CWL tool that converts VarDict tabular output to VCF.

  • VarDict Variant Calling workflow (also VarDict v1.8.3, CWL1.2), which can be used for single sample and paired sample variant calling using VarDictJava starting from WES, WGS or amplicon data.

We have also published the following workflows and a toolkit:

  • CNVnator Analysis workflow 0.4.1 for CNV calling by doing read-depth (RD) analysis of input BAM files.

  • CNVpytor workflow 1.1 for CNV/CNA detection and analysis based on read depth and allele imbalance in WGS.

Read More
Divya Sain Divya Sain

Release notes

Data Cruncher and Interactive Analysis become Data Studio and Interactive Browsers

Data Studio, previously Data Cruncher, is an interactive analysis tool which allows you to explore and visualize data using environments like JupyterLab and RStudio. Previously located under the Interactive Analysis tab, it has now been given a more prominent location in the project navigation by having its own tab located next to Tasks. With the removal of Data Studio from the Interactive Analysis tab, the tab's name has been changed to Interactive Browsers in order to better reflect its contents.

Recently published apps

We have just published an updated version (4.2.5.0) of Mutect2 workflows:

  • GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0, a workflow used for somatic short variant calling. It runs on a single tumor-normal pair or on a single tumor sample, and performs additional filtering and functional annotation tasks, and

  • GATK Create Mutect2 Panel of Normals 4.2.5.0 that creates a panel of normals for use in other GATK workflows. The workflow takes multiple normal sample callsets and passes them to GATK Somatic SNVs and INDELs (Mutect2) 4.2.5.0 with tumor-only mode (although it is called tumor-only, normal samples are given as the input) and additionally collates sites present in two or more samples into a sites-only VCF.

  • Three apps from the MetaXcan toolkit:

    • S-PrediXcan for computing associations between omic features and a complex trait starting from GWAS summary statistics.

    • S-MultiXcan for computing association from predicted gene expression to a trait, using multiple studies for each gene.

    • MetaMany for serially performing multiple MetaXcan runs on a GWAS study from summary statistics using multiple tissues.

  • The MetaXcan Workflow for computing associations between omic features and complex traits across multiple tissues. The workflow includes two tools from MetaXcan framework - MetaMany and S-MultiXcan and it uses summary statistics from a GWAS study and multiple models that predict the expression or splicing quantification.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published the V-pipe 2.99.2 for SARS-CoV-2 workflow for analyzing high throughput SARS-CoV-2 sequencing data. V-pipe integrates several tools for the analysis of viral high throughput sequencing data. It allows for assessing viral diversity at the level of SNVs, short variant sequences (or local haplotypes), and long-range haplotypes (or global haplotypes).

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published the updated 0.7.17 version of BWA MEM Bundle, a well-known tool designed for aligning sequence reads onto a large reference genome, and BWA INDEX, used for indexing the reference sequence as a prior step required for BWA MEM Bundle. Both tools are published in CWL1.2.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have published the following apps in our Public Apps gallery:

  • Cyrius (v1.1.1, CWL1.2), a tool that genotypes CYP2D6 in WGS data. It takes WGS BAM or CRAM files and creates a TSV report with CYP2D6 alleles.

  • Two PharmCAT (v1.6.0, CWL1.2) tools:

    • PharmCAT VCF Preprocess is a tool that takes a VCF file and prepares it for downstream processing with PharmCAT, and

    • PharmCAT, a tool for interpreting guideline variants in VCF files.

  • Two Biobambam2 (v2.0.183, CWL1.2) tools:

    • Biobambam2 Bamtofastq that converts BAM/CRAM/SAM files to FASTQ format, and

    • Biobambam2 Bamseqchksum - tool for calculating hashes for the contents of the provided alignments file.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published the following apps:

  • An updated version of the SRA Download and Set Metadata workflow (SRA Toolkit 3.0.0) that downloads metadata associated with SRA accession via SRA Run Info CGI, (on-demand instance) FASTQ files and sets corresponding metadata.

  • OptiType (v1.3.5, CWL1.2), a tool designed for precision HLA typing from next-generation sequencing data. It is based on the assumption that the correct HLA genotype explains the highest number of mapped reads.

  • fastENLOC (v1.0, CWL1.2), a tool that enables integrative genetic association analysis of molecular QTL data and GWAS data.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We have just published the following apps in our Public Apps gallery:

  • TwoSampleMR, a tool that performs Mendelian randomization testing for a given exposure-outcome pair. It is a wrapper around the TwoSampleMR R package and uses summary statistics data for making causal inference.

  • CCS, a tool that combines multiple subreads of the same SMRTbell molecule and outputs one highly accurate consensus sequence.

  • lima, a tool used with PacBio single-molecule sequencing data for barcode and primer sequences identification.

  • PacBio Flowcell Data Processing, a workflow that can be used to process PacBio CCS or CLR data in preparation for variant calling.

  • PacBio CCS or CLR WGS Variant Calling workflow that can be used to call structural variants in PacBio CCS or CLR data. The workflow can also call small variants in CCS data using Clair3.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

We’ve just published AnnotationDbi select and mapIds, a tool that maps one type of IDs to another. It is based on Bioconductor annotation data packages.

Read More
Divya Sain Divya Sain

Release notes

Recently published apps

New apps have been added to the CGC:

  • Two tools from the Samplot toolkit:

    • Samplot Plot takes alignment files and coordinates for a region containing the SV call of interest (Chromosome, Start position, and End position) and creates a plot of the SV region.

    • Samplot Vcf can be used to create visualizations of structural variant calls from a VCF file.

  • Seven tools from the Smoove toolkit:

    • Smoove Annotate annotates SV calls with SV quality and gene information from GFF3 files.

    • Smoove Call calls structural variants with Lumpy and optionally calls svtyper.

    • Smoove Duphold annotates SV calls in the file based on information from the provided alignment files.

    • Smoove Genotype runs svtyper in parallel on provided SV inputs.

    • Smoove Merge merges SV calls from individual files with SV calls and sorts them using svtools.

    • Smoove Paste squares matching SV calls from individual files to a single joint file with final calls.

    • Smoove Plot-counts takes a VCF file created by other Smoove tools and plots counts of split and discordant reads before and after filtering.

  • Upgraded four Sambamba tools to 0.8.1 (and CWL 1.2) and added three new tools:

    • Sambamba Flagstat generates statistics from read flags in a BAM file.

    • Sambamba Index creates a BAI or FAI index for the provided input.

    • Sambamba Markdup can be used to mark or remove duplicate reads from an input BAM file.

    • Sambamba Merge merges alignments in BAM format.

    • Sambamba Slice can be used to copy a slice (region) of the coordinate sorted and indexed input file in BAM or FASTA format.

    • Sambamba Sort sorts alignments in BAM format.

    • Sambamba View accepts alignments in BAM or SAM format and outputs data in a user-specified format.

Read More
Divya Sain Divya Sain

Release notes

GDC Datasets version update

As of March 11, 2022, GDC datasets available through the Data Browser and the API correspond to GDC Data Release 31.

Recently published apps

We have added four apps to our public apps gallery:

  • Single cell RNA-seq velocity analysis with scVelo 0.2.4 workflow that performs preprocessing, marker gene analysis, and velocity analysis of single-cell expression data. It is based on SingleCellExperiment, Seurat, scran, scater, AnnotationHub, scuttle, and scVelo.

  • Velocyto.py - Velocyto 0.17.17 is a package for the analysis of expression dynamics in single cell RNAseq data. In particular, it enables estimations of RNA velocities of single cells by distinguishing unspliced and spliced mRNAs in standard single-cell RNA sequencing protocols. Velocyto.py is a command line tool (distributed with the package) that is used to generate spliced/unspliced count matrices.

  • SBG single cell object convertor tool that performs conversion of single cell data object type for commonly used formats: Seurat, AnnotatedData, and SingleCellExperiment.

  • Single cell RNA-seq trajectory analysis with slingshot and tradeSeq, a tool that performs single cell trajectory analysis with slingshot 2.0.0, and differential expression testing on inferred trajectories with tradeSeq 1.6.0. Slingshot takes advantage of single cell data principal components analysis (PCA) and clustering to infer probable paths of cell development.

Read More