Bioinformatic Tools
Tools for the analysis of large-scale omics data (Z01)
Watchdog is a workflow management system for the automated and distributed analysis of large-scale omics data as generated in DEEP-DV.
The Watchdog software itself is available at: https://github.com/klugem/watchdog
Workflows for omics data analysis are available at: https://github.com/watchdog-wms/watchdog-wms-workflows
Kluge M, Friedel CC. Watchdog – a workflow management system for the distributed analysis of large-scale experimental data. BMC Bioinformatics. Mar 2018. 19(1):97. (»DOI: 10.1186/s12859-018-2107-4.) PDF
Kluge M, Friedl MS, Menzel AL, Friedel CC. Watchdog 2.0: New developments for reusability, reproducibility, and workflow execution. Gigascience. Jun 2020.9(6):giaa068.
(»DOI: 10.1093/gigascience/giaa068) PDF
RegCFinder automatically identifies subregions of genomic input windows (e.g. promoters, genes, enhancers) with differences in read density between two conditions. It can be applied to any type of omics data and thus lends itself to a wide range of applications.
RegCFinder is implemented as a workflow for Watchdog and available at: https://github.com/watchdog-wms/watchdog-wms-workflows
Weiß E, Friedel CC. RegCFinder: targeted discovery of genomic subregions with differential read density. Bioinform Adv. Jul 2023. 3(1):vbad085.
(»DOI: 10.1093/bioadv/vbad085) PDF
ContextMap2 is a splicing-aware read aligner that can be used to map in parallel against multiple genome sequences, including host and viral genomes. It furthermore allows detection of poly(A) sites from sequencing reads, in particular viral poly(A) sites.
ContextMap2 is available at: https://github.com/friedel-lab/ContextMap2
Bonfert T, Csaba G, Zimmer R, Friedel CC. A context-based approach to identify the most likely mapping for RNA-seq experiments. BMC Bioinformatics. Apr 2012. 13 Suppl 6(Suppl 6):S9.
(»DOI: 10.1186/1471-2105-13-S6-S9.) PDF
Bonfert T, Csaba G, Zimmer R, Friedel CC. Mining RNA-seq data for infections and contaminations. PLoS One. Sep 2013. 8(9):e73071.
(»DOI: 10.1371/journal.pone.0073071.) PDF
Bonfert T, Kirner E, Csaba G, Zimmer R, Friedel CC. ContextMap 2: fast and accurate context-based RNA-seq mapping. BMC Bioinformatics. Apr 2015. 16:122.
(»DOI: 10.1186/s12859-015-0557-5.) PDF
Bonfert T, Friedel CC. Prediction of Poly(A) Sites by Poly(A) Read Mapping. PLoS One. Jan 2017. 12(1):e0170914.
(»DOI: 10.1371/journal.pone.0170914.) PDF
Tools to analyse SLAM-seq data and other nucleotide conversion RNA-seq approaches (P02)
GRAND-SLAM is a tool to estimate new-to-total RNA (NTR) ratios from SLAM-seq data and other nucleotide conversion RNA-seq approaches. Provided with mapped reads of all samples (replicates, conditions, etc.) or cells (in case of single cell sequencing) from an experiment, it generates a table containing read counts, NTRs and their posterior distributions (reflecting uncertainty in estimating NTRs) of all sample and all genes.
GRAND-SLAM is part of the GEDI toolkit and available at: https://github.com/erhard-lab/gedi
Jürges C, Dölken L, Erhard F. Dissecting newly transcribed and old RNA using GRAND-SLAM. Bioinformatics. Jul 2018. 34(13):i218-i226. (»DOI:10.1093/bioinformatics/bty256) PDF
Erhard F, Baptista MAP, Krammer T, Hennig T, Lange M, Arampatzi P, Jürges CS, Theis FJ, Saliba AE, Dölken L. scSLAM-seq reveals core features of transcription dynamics in single cells. Nature. Jul 2019. 571(7765):419-423. (»DOI:10.1038/s41586-019-1369-y) PDF
Erhard F. Two-Step Parameter Estimation for Read Feature Models. Künstl Intell. Jan 2024.
(»DOI:10.1007/s13218-023-00821-w) PDF
grandRescue is a software to circumvent mappability problems and correct for 4sU-induced quantification bias in SLAM-seq data and other nucleotide conversion RNA-seq approaches. To achieve this, grandRescue aligns previously unmappable reads in a T-to-C mismatch independent manner.
GrandRescue is available at https://github.com/erhard-lab/grandRescue
Berg K, Lodha M, Delazer I, Bartosik K, Garcia YC, Hennig T, Wolf E, Dölken L, Lusser A, Prusty BK, Erhard F. Correcting 4sU induced quantification bias in nucleotide conversion RNA-seq data. Nucleic Acids Res. Apr 2024. 52(7):e35.
(»DOI:10.1093/nar/gkae120) PDF
After primary processing by GRAND-SLAM, the R package grandR provides specialized tools for downstream analyses of SLAM-seq data. grandR provides a comprehensive toolbox for quality control, kinetic modeling, differential gene expression analysis and visualization of such data. It provides an interface to Seurat for single cell analyses and a web based visualization via shiny.
GrandR is available on CRAN and at https://grandr.erhard-lab.de/
Rummel T, Sakellaridi L, Erhard F. grandR: a comprehensive package for nucleotide conversion RNA-seq data analysis. Nat Commun. Jun 2023. 14(1):3559.
(»DOI:10.1038/s41467-023-39163-4) PDF
iTiSS (integrated Transcriptional start site caller) is a method to identify transcriptional start sites (TiSS) from various TiSS-profiling experiments with an additional integrative module to combine and remove artefactual TiSS called in single data sets.
iTiSS is available at https://github.com/erhard-lab/iTiSS
Jürges CS, Dölken L, Erhard F. Integrative transcription start site identification with iTiSS. Bioinformatics. Sep 2021. 37(18):3056-3057. (»DOI:10.1093/bioinformatics/btab170) PDF
LFC is a tool for estimating log fold changes and pseudocounts for RNA-seq experiments. It does not only provide points estimates, but computes posterior probabilities of log fold changes.
LFC is available in CRAN and at https://github.com/erhard-lab/lfc
Erhard F, Zimmer R. Count ratio model reveals bias affecting NGS fold changes. Nucleic Acids Res. Jul 2015. 43(20):e136-e136. (»DOI:10.1093/nar/gkv696) PDF
Erhard F. Estimating pseudocounts and fold changes for digital expression measurements. Bioinformatics. Dec. 2018.34(23):4054-4063. (»DOI:10.1093/bioinformatics/bty471) PDF
Tools for multi-omics integration using spatial image registration. (P03)
VoltRon is a spatial omic analysis toolbox for multi-omics integration using spatial image registration. VoltRon is also capable of analyzing multiple types of spatially-aware data modalities.
- Unique data structure of VoltRon allows users to seamlessly define tissue blocks, layers and multiple assay types in one R object.
- End-to-end downstream data analysis for distinct spatial biology technologies are supported. VoltRon visualizes and analyzes regions of interests (ROIs), spots, cells, molecules and tiles (under development).
- Automated Image Registration incorporates OpenCV (fully embedded into the package using Rcpp) to detect common features across images and achieves registration. Users may interact with built-in mini shiny apps to change alignment parameters and validate alignment accuracy.
- Manual Image Registration helps users to select common features across spatial datasets using reference images stored in VoltRon objects. In case automated image registration doesn’t work, you can still align images by manually picking landmark points.
- Niche Clustering allows integration to single cell RNA analysis datasets using Seurat, SingleCellExperiment and spacexr for spot deconvolution. Estimated cell type abundances are then used to cluster spots into groups of cell type niches which are defined as spots with distinct composition of cell types
VoltRon is available at: https://bioinformatics.mdc-berlin.de/VoltRon/index.html