MyNixOS website logo
Description

Analyze and Compare Nucleotide Recoding RNA Sequencing Datasets.

Several implementations of a novel Bayesian hierarchical statistical model of nucleotide recoding RNA-seq experiments (NR-seq; TimeLapse-seq, SLAM-seq, TUC-seq, etc.) for analyzing and comparing NR-seq datasets (see 'Vock and Simon' (2023) <doi:10.1261/rna.079451.122>). NR-seq is a powerful extension of RNA-seq that provides information about the kinetics of RNA metabolism (e.g., RNA degradation rate constants), which is notably lacking in standard RNA-seq data. The statistical model makes maximal use of these high-throughput datasets by sharing information across transcripts to significantly improve uncertainty quantification and increase statistical power. 'bakR' includes a maximally efficient implementation of this model for conservative initial investigations of datasets. 'bakR' also provides more highly powered implementations using the probabilistic programming language 'Stan' to sample from the full posterior distribution. 'bakR' performs multiple-test adjusted statistical inference with the output of these model implementations to help biologists separate signal from background. Methods to automatically visualize key results and detect batch effects are also provided.

Brief Description of bakR

bakR (Bayesian analysis of the kinetics of RNA) is an R package for performing differential kinetics analysis with nucleotide recoding high-throughput RNA sequencing (NR-seq) data. Kinetic parameter estimation and statistical testing is compatible with mutational data from any enrichment free NR-seq method (e.g., TimeLapse-seq, SLAM-seq, TUC-seq, etc.).

Version 1.0.0 is out now! (06/27/2023)

A lot of functionality has been added, and I highly suggest all users of bakR to update to this version. There are also many new vignettes to discuss these new features. bakR v1.0.0 is now available for installation on CRAN! It is also currently available for installation from Github, as described below. Two major new additions are:

  1. Ability to use GRAND-SLAM output (or fraction new estimates more generally) as bakR input
  2. Strategy for correcting metabolic label related biases in kinetic parameter estimates and read counts

Why use bakR?

Differential expression analysis of RNA sequencing (RNA-seq) data can identify changes in cellular RNA levels, but cannot determine the kinetic mechanism underlying such changes. Previously, our lab and others addressed this shortcoming by developing nucleotide-recoding RNA-seq methods (NR-seq; e.g., TimeLapse-seq) to quantify changes in RNA synthesis and degradation kinetics. While advanced statistical models implemented in user-friendly software (e.g., DESeq2) have ensured the statistical rigor of differential expression analyses, no such tools that facilitate differential kinetic analysis with NR-seq exist. To address this need, we developed bakR, an R package that analyzes and compares NR-seq datasets. Differential kinetics analysis with bakR relies on a Bayesian hierarchical model of NR-seq data to increase statistical power by sharing information across transcripts. bakR outperforms attempts to use single sample analysis tools (e.g., pulseR and GRAND-SLAM) for differential kinetics analysis. Check out our manuscript in RNA to learn more about the model and its extensive validation!

Installation

bakR is now available on CRAN! If you are using a Mac or Windows OS then that means you don't need to configure a C++ compiler to install and use bakR. Those not on a Mac Windows OS will need to first properly configure a C++ compiler; see the next paragraph for details and links describing how to do that. In either case, once you (and your compiler if necessary) are ready, bakR can be installed as follows:

install.packages("bakR") 

To install the newest version of bakR from Github, you need to have a C++ compiler configured to rstan's (the R interface to the probabilistic programming language Stan that bakR uses on the backend) liking. The best way to do this is to follow the Stan team's helpful documentation on installing rstan for your operating system. Once that is complete, you can install bakR as follows:

install.packages("devtools") # if you haven't installed devtools already
devtools::install_github("simonlabcode/bakR")

Documentation

There are currently seven vignettes to help get you up to speed with using bakR:

  1. An introductory vignette (title: Differential Kinetic Analysis with bakR) that walks you through the basic bakR workflow with simulated data.
  2. A more concise version of the introductory vignette that will get you up and running with bakR quickly (title: bakR for people in a hurry). Particularly appropriate for those who are very comfortable with adopting new bioinformatic tools.
  3. Combining bakR with differential expression analysis to perform differential synthesis rate analysis (title: Differential synthesis analysis with bakR and DESeq2).
  4. How to use fraction new estimates (e.g., from a tool like GRAND-SLAM) as input to bakR, a new feature introduced in version 1.0.0 (title: GRAND-SLAM output/fn estimates as bakR input).
  5. Correcting for disproportionate loss of s4U containing RNA (title: Correcting for dropout). This phenomenon, termed dropout, is discussed in two recent preprints, one from our lab and one from the Erhard lab.
  6. How to identify and deal with problems that can crop up when analyzing NR-seq data (title: Troubleshooting analyses of NR-seq data with bakR).
  7. Distinguishing transcriptional and post-transcriptional regulation, even when the steady-state assumption is partially violated (title: Steady-state quasi-independent mechanistic investigations). Describes a new and somewhat experimental function in bakR, DissectMechanism.

All vignettes are available on the bakR website under the Articles section. Here is the link to the bakR github as well if you need help getting back to the github from the website.

Obtaining the Necessary Input

As discussed in the introductory vignette, bakR requires data in the form of a so-called "cB", or counts binomial data frame. Each row of the cB data frame corresponds to a group of reads with identical mutational data, and the columns denote the sample from which the reads came, the feature the reads aligned to, the number of mutations of interest in the reads (e.g., T-to-C mutations), the number of mutable positions (e.g. Ts), and the number of such reads. It is reasonable to wonder "where am I supposed to get this information?" While there are a couple possibilities, perhaps the easiest and most widely applicable is bam2bakR, a Snakemake implementation of the TimeLapse pipeline developed by the Simon lab. bam2bakR takes as input aligned bam files and produces, among other things, the cB file required by bakR. Extensive documentation describing how to get bam2bakR up and running is available on its GitHub repo. Snakemake greatly facilitates running this pipeline on almost any computational infrastructure and bam2bakR uses the conda/mamba package manager to make setting up the necessary dependencies a breeze.

As of version 1.0.0, bakR can also take as input fraction new (sometimes referred to as new-to-total ratio, or NTR) estimates. These are obtainable via tools like GRAND-SLAM, or perhaps a custom analysis pipeline that you developed while working with NR-seq datasets!

Bug Catching and Further Questions

Post descriptions of bugs and a simple reproducible example (if possible) in the Issues section of this repo. In fact, you should go to the Issues section with any question you have about bakR, and there are even helpful labels that you can append to your posts to make the nature of your request clear. If you email me (Isaac Vock) with a question/concern/suggestion, I will direct you to the Issues section. If you have basic use questions, I would suggest going through the vignettes linked above. If these do not answer your question, then post your question to Issues.

Metadata

Version

1.0.1

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows