MyNixOS website logo
Description

Estimating the Optimal Number of Migration Edges from 'Treemix'.

The popular population genetic software 'Treemix' by 'Pickrell and Pritchard' (2012) <DOI:10.1371/journal.pgen.1002967> estimates the number of migration edges on a population tree. However, it can be difficult to determine the number of migration edges to include. Previously, it was customary to stop adding migration edges when 99.8% of variation in the data was explained, but 'OptM' automates this process using an ad hoc statistic based on the second-order rate of change in the log likelihood. 'OptM' also has added functionality for various threshold modeling to compare with the ad hoc statistic.

OptM

Estimating the optimal number of migration edges from Treemix

Description

This package uses results from the population software 'Treemix' by Pickrell and Pritchard (2012) DOI:10.1371/journal.pgen.1002967 to estimate the optimal number of migrations edges to add to the tree. Furthermore, it has also been updated to work with the output of OrientAGraph (see Molloy et al. 2021), a more advanced admixture graph representation software built on top of the 'Treemix' engine. In Treemix, it was customary to stop adding migration edges when 99.8% of variation in the data was explained, but optM automates this process using an ad hoc statistic based on the second order rate of change in the log likelihood. OptM has added functionality for various threshold modeling to compare with the ad hoc statistic. The various methods are:

  • "Evanno" - calculates an ad hoc statistic we call deltaM based on the Evanno method, or second-order rate of change in likelihood weighted by the standard deviation.
  • "linear" - estimates of the optimal M based on a piecewise linear (change point), bent cable (alpha), simple exponential (threshold, default 5%), or non-linear least squares (threshold, default 5%) models
  • "SiZer" - a method to map and analyze derivatives for change point estimation for ecological modeling.

Install OptM (from an R console)

  • To install from CRAN
    • First install the R package 'SiZer' from CRAN using the command install.packages("SiZer")
    • Then install the OptM package using install.packages("OptM")
    • Load the package into your working R environment using library(OptM)

Preparing the input files

To run OptM, you will need a folder of output files produced by Treemix v1.13 or OrientAGraph. The function optM will automatically search the folder for the stem.llik, stem.modelcov.gz, and stem.cov.gz files; where "stem" is that provided to the -o parameter of treemix. It is recommended, but not required, to use stem in the format stem.i.M; where

  • stem is any name you prefer
  • i is the iteration number for that value of M
  • M is the number of migration edges used for the treemix run (-m parameter)

In order for optM to function properly, you must run:

  • At least two iterations at each value of M (the number of migration edges)
  • M>2.
  • The range for M must be sequential integers (e.g., 1, 2, 3, etc)
  • You do not need to run M=0 because treemix automatically includes this as the null model in each run.

NOTE: There will be an error check to see if there is variation across iterations for each M. In other words, if the data are very robust, you may get the same likelihood across all runs, thus the standard deviation across runs is zero and the ad hoc statistic is undefined. In this case, try making larger variations in the dataset (subsetting the SNPs, varying -k in treemix, or other method of permutation/bootstrap).

Below is an example run of treemix from a UNIX terminal for M={1-10} and 5 iterations per M:

for m in {1..10}
   do
   for i in {1..5}
      do
      treemix \
         -i test.treemix.gz \
         -o test.${i}.${m} \
         -global \
         -m ${m} \
         -k 1000
      done 
done

To run OptM in R:

  • First load the provided example data for a simulated dataset with 3 migration edges; and 10 iterations for M={1-10}

    • folder <- system.file("extdata", package = "OptM")
  • Next, run optM using the default "Evanno"-like method:

    • test.optM = optM(folder)
  • Finally, plot the results:

    • plot_optM(test.optM, method = "Evanno")
  • Alternatively, run using various linear modeling estimates rather than the ad hoc statistic:

    • folder <- system.file("extdata", package = "OptM")
    • test.linear = optM(folder, method = "linear")
    • plot_optM(test.linear, method = "linear")
  • OR using SiZer:

    • folder <- system.file("extdata", package = "OptM")
    • test.sizer = optM(folder, method = "SiZer")
    • plot_optM(test.sizer, method = "SiZer")

Version History

  • Version 0.1.8, 2024/6/16
    • In order to submit to CRAN, had to make a few updates.
    • Converted the old CITATION functions from citEntry to bibentry.
    • Removed calls to "Imports" boots and splines in the DESCRIPTION since these packages were not explicitly referenced.
    • Changed if(class(input) != "SiZer") stop("Input object is not of class SiZer.\n") to if(!"SiZer" %in% class(input)) stop("Input object is not of class SiZer.\n") in case an object inherited multiple classes.
    • Updated tidy on my Mac OSX using brew install tidy-html5 to the tidy v5.8.0 library. This fixed some HTML warnings that were specific to Macs.
  • Version 0.1.7, 2023/9/14
    • Updated read.table to read.delim, and added the option strip.white = TRUE to read.delim. This should fix some errors when reading files from orientagraph, since for some reason OrientaGraph added an empty space to lines in the .llik files when m=0, but not to other lines. This generated an error: "Error in scan(file = file, what = what, sep = sep, quote = quote, dec = dec, : line 2 did not have 8 elements"
  • Version 0.1.6, 2021/9/30
    • Updated citation for OptM
  • Version 0.1.5, 2021/7/9
    • Added capability to work with OrientAGraph output. Thanks Cui Wang!
  • Version 0.1.4, 2019/7/1
    • Fixed typos
    • Squashed a plotting bug (changed Y axis labels to horizontal)
    • Added 'ignore' parameter for when running Treemix with preset or fixed migration edges or input tree.
  • Version 0.1.3, 2019/4/23
    • The read.treemix function now searches for all treemix input files, and the specially formatted stem is no longer required. Thanks Jie Zhong!!!
  • Version 0.1.2, 2019/3/1
    • Fixed typo in DESCRIPTION - Pickrell and Pritchard 2012, not 2002
    • In the plot_optM, changed the plotting color to have an alpha (semi-transparent) fill and Y-axis labels for Δm
  • Version 0.1.1, 2019/1/2
    • Released the first version

Citation

Fitak, R. R. (2021) OptM: estimating the optimal number of migration edges on population trees using Treemix. Biology Methods and Protocols. 6(1):bpab017.

  • Or enter the command citation("OptM") into your R console

Contact

Robert Fitak
Department of Biology
University of Central Florida
USA
[email protected].

Metadata

Version

0.1.8

License

Unknown

Platforms (75)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows