Simulate and Model Family Pedigrees with Structured Founders.
simfam
The goal of simfam is to simulate and model families with founders drawn from a structured population. The main function simulates a random pedigree for many generations with realistic features. Additional functions calculate kinship matrices, admixture matrices, and draw random genotypes across arbitrary pedigree structures starting from the corresponding founder values.
Installation
You can install the released version of simfam from CRAN with:
install.packages("simfam")
The current development version can be installed from the GitHub repository using devtools
:
install.packages("devtools") # if needed
library(devtools)
install_github('OchoaLab/simfam', build_vignettes = TRUE)
You can see the package vignette, which has more detailed documentation and examples, by typing this into your R session:
vignette('simfam')
Examples
These are some basic ways of calling the main functions.
# load package!
library(simfam)
Simulate a random pedigree with a desired number of individuals per generation n
and a number of generations G
:
data <- sim_pedigree( n, G )
# creates a plink-formatted FAM table
# (describes pedigree, most important!)
fam <- data$fam
# lists of IDs split by generation
ids <- data$ids
# and local kinship of last generation
kinship_local_G <- data$kinship_local
The basics of encoding a pedigree in a fam
table (a data.frame) is that every individual in the pedigree is a row, column id
identifies the individual with a unique number or string, columns pat
and mat
identify the parents of the individual (who are themselves earlier rows), and sex
encodes the sex of the individual numerically (1=male, 2=female). The following functions work with arbitrary pedigrees/fam
data.frames:
Prune a given fam
, to speed up simulations/etc, by removing individuals without descendants among set of individuals ids
(in this example, the last generation from the output of sim_pedigree
):
fam <- prune_fam( fam, ids[[G]] )
Draw genotypes X
through pedigree, starting from genotypes of founders (X_1
):
X <- geno_fam( X_1, fam )
# Version for last generation only, which uses less memory.
# (`ids` must be as from `sim_pedigree`,
# a list partitioning non-overlapping generations)
X_G <- geno_last_gen( X_1, fam, ids )
Calculate kinship through pedigree, starting from kinship of founders (kinship_1
):
kinship <- kinship_fam( kinship_1, fam )
# Version for last generation only, which uses less memory.
kinship_G <- kinship_last_gen( kinship_1, fam, ids )
Calculate expected admixture proportions through pedigree, starting from admixture of founders (admix_proportions_1
):
admix_proportions <- admix_fam( admix_proportions_1, fam )
# Version for last generation only, which uses less memory.
admix_proportions_G <- admix_last_gen( admix_proportions_1, fam, ids )