Bioinformatic Tools

Tools for the analysis of large-scale omics data (Z01)

Watchdog is a workflow management system for the automated and distributed analysis of large-scale omics data as generated in DEEP-DV.

The Watchdog software itself is available at: https://github.com/klugem/watchdog

Workflows for omics data analysis are available at: https://github.com/watchdog-wms/watchdog-wms-workflows

Kluge M, Friedel CC. Watchdog – a workflow management system for the distributed analysis of large-scale experimental data. BMC Bioinformatics. Mar 2018. 19(1):97. (»DOI: 10.1186/s12859-018-2107-4.) PDF

Kluge M, Friedl MS, Menzel AL, Friedel CC. Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution. Gigascience. Jun 2020.9(6):giaa068.
(»DOI: 10.1093/gigascience/giaa068) PDF

We developed a pipeline that automatically identifies genomic variations in viral genomes from functional genomics sequencing data, such as RNA-seq, ATAC-seq or similar. This includes both SNPs as well as insertions and deletions that either may occur naturally or have been deliberately introduced to knock out specific viral genes. The pipeline furthermore provides an option to identify the viral strain when provided a set of reference SNPs for different strains.
 
The pipeline is implemented as a workflow (VariantCallerPipeline) for Watchdog and available at: https://github.com/watchdog-wms/watchdog-wms-workflows .

RegCFinder automatically identifies subregions of genomic input windows (e.g. promoters, genes, enhancers) with differences in read density between two conditions. It can be applied to any type of omics data and thus lends itself to a wide range of applications.

RegCFinder is implemented as a workflow for Watchdog and available at: https://github.com/watchdog-wms/watchdog-wms-workflows

Weiß E, Friedel CC. RegCFinder: targeted discovery of genomic subregions with differential read density. Bioinform Adv. Jul 2023. 3(1):vbad085.
(»DOI: 10.1093/bioadv/vbad085) PDF

ContextMap2 is a splicing-aware read aligner that can be used to map in parallel against multiple genome sequences, including host and viral genomes. It furthermore allows detection of poly(A) sites from sequencing reads, in particular viral poly(A) sites.

ContextMap2 is available at: https://github.com/friedel-lab/ContextMap2

Bonfert T, Csaba G, Zimmer R, Friedel CC. A context-based approach to identify the most likely mapping for RNA-seq experiments. BMC Bioinformatics. Apr 2012. 13 Suppl 6(Suppl 6):S9.
(»DOI: 10.1186/1471-2105-13-S6-S9.) PDF

Bonfert T, Csaba G, Zimmer R, Friedel CC. Mining RNA-seq data for infections and contaminations. PLoS One. Sep 2013. 8(9):e73071.
(»DOI: 10.1371/journal.pone.0073071.) PDF

Bonfert T, Kirner E, Csaba G, Zimmer R, Friedel CC. ContextMap 2: fast and accurate context-based RNA-seq mapping. BMC Bioinformatics. Apr 2015. 16:122.
(»DOI: 10.1186/s12859-015-0557-5.) PDF

Bonfert T, Friedel CC. Prediction of Poly(A) Sites by Poly(A) Read Mapping. PLoS One. Jan 2017. 12(1):e0170914.
(»DOI: 10.1371/journal.pone.0170914.) PDF

Tools to analyse SLAM-seq data and other nucleotide conversion RNA-seq approaches (P02)

GRAND-SLAM is a tool to estimate new-to-total RNA (NTR) ratios from SLAM-seq data and other nucleotide conversion RNA-seq approaches. Provided with mapped reads of all samples (replicates, conditions, etc.) or cells (in case of single cell sequencing) from an experiment, it generates a table containing read counts, NTRs and their posterior distributions (reflecting uncertainty in estimating NTRs) of all sample and all genes.

GRAND-SLAM is part of the GEDI toolkit and available at: https://github.com/erhard-lab/gedi

Jürges C, Dölken L, Erhard F. Dissecting newly transcribed and old RNA using GRAND-SLAM. Bioinformatics. Jul 2018. 34(13):i218-i226. (»DOI:10.1093/bioinformatics/bty256) PDF

Erhard F, Baptista MAP, Krammer T, Hennig T, Lange M, Arampatzi P, Jürges CS, Theis FJ, Saliba AE, Dölken L. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature. Jul 2019. 571(7765):419-423. (»DOI:10.1038/s41586-019-1369-y) PDF

Erhard F. Two-Step Parameter Estimation for Read Feature Models. Künstl Intell. Jan 2024.
(»DOI:10.1007/s13218-023-00821-w) PDF 

grandRescue is a software to circumvent mappability problems and correct for 4sU-induced quantification bias in SLAM-seq data and other nucleotide conversion RNA-seq approaches. To achieve this, grandRescue aligns previously unmappable reads in a T-to-C mismatch independent manner.

GrandRescue is available at https://github.com/erhard-lab/grandRescue

Berg K, Lodha M, Delazer I, Bartosik K, Garcia YC, Hennig T, Wolf E, Dölken L, Lusser A, Prusty BK, Erhard F. Correcting 4sU induced quantification bias in nucleotide conversion RNA-seq data. Nucleic Acids Res.  Apr 2024. 52(7):e35.
(»DOI:10.1093/nar/gkae120) PDF 

After primary processing by GRAND-SLAM, the R package grandR provides specialized tools for downstream analyses of SLAM-seq data. grandR provides a comprehensive toolbox for quality control, kinetic modeling, differential gene expression analysis and visualization of such data. It provides an interface to Seurat for single cell analyses and a web based visualization via shiny.

GrandR is available on CRAN and at https://grandr.erhard-lab.de/

Rummel T, Sakellaridi L, Erhard F. grandR: a comprehensive package for nucleotide conversion RNA-seq data analysis. Nat Commun. Jun 2023. 14(1):3559.
(»DOI:10.1038/s41467-023-39163-4) PDF

iTiSS (integrated Transcriptional start site caller) is a method to identify transcriptional start sites (TiSS) from various TiSS-profiling experiments with an additional integrative module to combine and remove artefactual TiSS called in single data sets.

iTiSS is available at https://github.com/erhard-lab/iTiSS

Jürges CS, Dölken L, Erhard F. Integrative transcription start site identification with iTiSS. Bioinformatics. Sep 2021. 37(18):3056-3057. (»DOI:10.1093/bioinformatics/btab170) PDF

LFC is a tool for estimating log fold changes and pseudocounts for RNA-seq experiments. It does not only provide points estimates, but computes posterior probabilities of log fold changes.

LFC is available in CRAN and at https://github.com/erhard-lab/lfc

Erhard F, Zimmer R. Count ratio model reveals bias affecting NGS fold changes. Nucleic Acids Res. Jul 2015. 43(20):e136-e136. (»DOI:10.1093/nar/gkv696) PDF

Erhard F. Estimating pseudocounts and fold changes for digital expression measurements. Bioinformatics. Dec. 2018.34(23):4054-4063. (»DOI:10.1093/bioinformatics/bty471) PDF

Tools for multi-omics integration using spatial image registration. (P03)

VoltRon is a spatial omic analysis toolbox for multi-omics integration using spatial image registration. VoltRon is also capable of analyzing multiple types of spatially-aware data modalities.

  • Unique data structure of VoltRon allows users to seamlessly define tissue blocks, layers and multiple assay types in one R object.
  • End-to-end downstream data analysis for distinct spatial biology technologies are supported. VoltRon visualizes and analyzes regions of interests (ROIs), spots, cells, molecules and tiles (under development).
  • Automated Image Registration incorporates OpenCV (fully embedded into the package using Rcpp) to detect common features across images and achieves registration. Users may interact with built-in mini shiny apps to change alignment parameters and validate alignment accuracy.
  • Manual Image Registration helps users to select common features across spatial datasets using reference images stored in VoltRon objects. In case automated image registration doesn’t work, you can still align images by manually picking landmark points.
  • Niche Clustering allows integration to single cell RNA analysis datasets using SeuratSingleCellExperiment and spacexr for spot deconvolution. Estimated cell type abundances are then used to cluster spots into groups of cell type niches which are defined as spots with distinct composition of cell types

VoltRon is available at: https://bioinformatics.mdc-berlin.de/VoltRon/index.html