MyNixOS website logo
Description

Bayesian Hierarchical Modeling for Label-Free Proteomics.

Statistical decision in proteomics data using a hierarchical Bayesian model. There are two regression models for describing the mean-variance trend, a gamma regression or a latent gamma mixture regression. The regression model is then used as an Empirical Bayes estimator for the prior on the variance in a peptide. Further, it assumes that each measurement has an uncertainty (increased variance) associated with it that is also inferred. Finally, it tries to estimate the posterior distribution (by Hamiltonian Monte Carlo) for the differences in means for each peptide in the data. Once the posterior is inferred, it integrates the tails to estimate the probability of error from which a statistical decision can be made. See Berg and Popescu for details (<doi:10.1016/j.mcpro.2023.100658>).

R-CMD-check Rhub DOI Badge CRAN status

Baldur

Baldur is a hierarchical Bayesian model for the analysis of proteomics data. By leveraging empirical Bayes methods, Baldur estimates hyperparameters for variance and measurement-specific uncertainty. It then computes the posterior difference in means between conditions for each peptide, protein, or PTM, and integrates the posterior to estimate error probabilities.

Features

  • Hierarchical Bayesian modeling of proteomics data
  • Empirical Bayes estimation of variance and uncertainty
  • Posterior probability calculations for differential analysis
  • Supports peptide, protein, and PTM data

Installation

Install the stable release from CRAN:

install.packages('baldur')

Or, install the development version from GitHub (after installing rstan):

Follow the instructions for installing rstan: https://github.com/stan-dev/rstan/wiki/RStan-Getting-Started

Then:

devtools::install_github('PhilipBerg/baldur', build_vignettes = TRUE)

Note:

  • On Ubuntu, pandoc may be needed to build vignettes.
  • On Windows, sometimes the development version of rstan is required.

Usage

For detailed examples, see the package vignettes:

vignette('baldur_yeast_tutorial')
vignette('baldur_ups_tutorial')

Main Modeling Work and Equations

Baldur implements a hierarchical Bayesian framework for label-free proteomics quantification, designed to robustly estimate differential abundance while accounting for the mean-variance relationship in mass spectrometry data. For exact details please see the original paper.

1. Observation Model

For each feature (peptide, protein, or PTM) $i$ in sample $j$, the observed intensity $y_{ij}$ is modeled as:

$$y_{ij}\sim\text{Normal}(\mu_{j},\sigma u_{ij})$$ $$\mu_{j}\sim\text{Normal}(\mu_{0j}+\eta_j\sigma,\sigma)$$

2. Mean-Variance Modeling

Gamma Regression

The measurement standard deviation $s_{j}$ is not constant, but depends on the mean intensity. This relationship is modeled with gamma regression: $$s_{j} \sim \Gamma(\alpha, \frac{\alpha}{\beta(\bar{y}_j)})$$ where:

  • $\alpha$: shape parameter (estimated empirically)
  • $\beta(\bar{y}_j)$: rate parameter as a function of peptide/protein mean intensity

Latent Gamma Mixture Regression

Regression Function

For each observation, the expected mean-variance relationship is modeled as:

$$\beta_i=\kappa\cdot\exp(\theta_i\cdot(I_L-S_Lx_i))+\exp(I-S\bar{y}_i)$$

where:

  • $S, S_L$: slope parameters (common and latent)
  • $I, I_L$: intercepts (common and latent)
  • $\bar{y}_i$: mean
  • $\theta_i$: feature-specific mixture parameter
Likelihood

Given the expected mean-variance, the observed standard deviation $\sigma_i$ is modeled as: $$\sigma_i\sim\Gamma(\alpha,\frac{\alpha}{\beta_i})$$ where:

  • $\alpha$: gamma shape parameter
  • $\beta_i$: expected mean-variance for observation $i$
Priors
  • $\alpha\sim\text{Cauchy}(0,25)$
  • $\eta\sim\text{Normal}(0,1)$
  • $I_L\sim\text{SkewNormal}(2,15,35)$
  • $\theta_i\sim\text{Uniform}(0,1)$
NRMSE (Model Fit Metric)

The normalized root-mean-square error (NRMSE) is calculated for model diagnostics.

3. Hierarchical Modeling of Condition Means

Empirical Bayes Prior

  • What is it?
    A prior that is estimated from your actual data, rather than set by hand.
  • How does it work?
    • Baldur looks at the spread and center of observed means across features.
    • It sets the mean hyper-prior for each group to match the average observed mean, and its uncertainty to match the variability in your data.
  • Why use it?
    • Strength: Provides "shrinkage" toward realistic values, reducing noise and false positives, especially with small sample sizes.
  • Mathematical form:
    $\mu_{0j}\sim\text{Normal}(\bar{y}_j,\sigma n_R)$
    • Here, $\bar{y}_j$ is the estimated mean for group $j$, and $\sigma n_R$ is the estimated standard deviation for the prior.

Weakly Informative Prior

  • What is it?
    A broad, generic prior that doesn't make strong assumptions—it's like saying "I have no idea what the mean should be, but it's probably not infinite."
  • How does it work?
    • The prior mean is set to zero for all groups.
    • The uncertainty (standard deviation) is set to a large value (e.g., $10$), meaning the model expects almost any value is possible.
  • Why use it?
    • Strength: Maximizes flexibility and lets the data speak for itself, at the cost of potentially adding more noise or less stability if data is limited.
  • Mathematical form:
    $$\mu_{0j}\sim\text{Normal}(0,10)$$
    • For group $j$, the mean is $0$ and the standard deviation is $10$.

4. Differential Abundance

For differential analysis, Baldur estimates the posterior distribution of the difference in means between conditions: $$\boldsymbol{D}\sim\mathcal{N}(\boldsymbol{\mu}^\text{T}\boldsymbol{K},\sigma\boldsymbol{\xi}),\quad \xi_{m}=\sqrt{\sum_{i=1}^{C}\frac{|k_{im}|}{n_i}}$$

where:

  • $\boldsymbol{K}$: contrast matrix
  • $k_{im}$: contrast coefficient for condition $i$ in contrast $m$
  • $n_i$: number of samples in condition $i$
  • $\boldsymbol{\xi}$: scaling factor for each contrast

The probability of error for contrast $c$ is then:

$$P(\mathrm{error}) = 2\Phi(-|\mu_{D_c} - \mu_{h_0}| \odot \tau_{D_c})$$

where:

  • $\Phi$: cumulative distribution function (CDF) of the standard normal
  • $\mu_{h_0}$: null hypothesis mean (often zero)
  • $\boldsymbol{\tau}_{\boldsymbol{D}}$: precision (inverse standard deviation) for each contrast
  • $\odot$: element-wise multiplication

Summary:
Baldur combines hierarchical modeling, mean-variance trend estimation via gamma regression, and empirical Bayes to robustly quantify differential abundance and propagate uncertainty from individual measurements to protein/PTM level, outputting interpretable error probabilities for each feature.

For full details, see the reference publication.

Reference

Berg, Philip, and George Popescu.
“Baldur: Bayesian Hierarchical Modeling for Label-Free Proteomics with Gamma Regressing Mean-Variance Trends.”
Molecular & Cellular Proteomics (2023): 2023-12.
https://doi.org/10.1016/j.mcpro.2023.100658

Metadata

Version

0.0.4

License

Unknown

Platforms (80)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    uefi
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-uefi
  • aarch64-windows
  • aarch64_be-none
  • arc-linux
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-linux
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • sh4-linux
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-uefi
  • x86_64-windows