Description
Gene and Region Counting of Mutations ("GARCOM").
Description
Gene and Region Counting of Mutations (GARCOM) package computes mutation (or alleles) counts per gene per individuals based on gene annotation or genomic base pair boundaries. It comes with features to accept data formats in plink(.raw) and VCF. It provides users flexibility to extract and filter individuals, mutations and genes of interest.
README.md
GARCOM
The goal of GARCOM is to provide mutation counts per individual within genetic boundaries (genes). It accepts different data formats with input file from plink (.raw), gene boundaries, SNP location. It also accepts VCF file format. The vcf file is assumed to be pVCF, that is where all samples are merged/combined.
Installation
You can install the released version of GARCOM from CRAN with:
install.packages("GARCOM")
Example
This is a small example which shows you how to use GARCOM:
library(GARCOM)
## basic example code
## sample data provided with library: genecoord, snpgene, snppos and genecoord
## Input data requires output from PLINK --recode flag. plink --bfile input --recode A --out sample_output
#input data: .raw formatted and SNP-gene (two columns)
gene_annot_counts(recodedgen,snpgene)
#input data: .raw formatted, SNP location (two columns) and Gene boundaries (three columns)
gene_pos_counts(recodedgen, snppos, genecoord)
#read VCF file vcf_data <- vcfR::read.vcfR("CHRXX.vcf.gz", verbose=TRUE)
vcf_counts_annot(vcf_data,df_snpgene) # pass vcf data read and data frame with SNP-gene annotation
#read VCF file vcf_data <- vcfR::read.vcfR("CHRXX.vcf.gz", verbose=TRUE)
vcf_counts_SNP_genecoords(vcf_data,df_snppos,df_genecoords) # pass vcf data read and data frame SNP position and third with gene coordinates
#subset individuals
vcf_counts_SNP_genecoords(vcf_data,df_snptestpos, df_genecoordstestpos,keep_indiv=c("IID1","IID2"))
ind_select<-c("IID1","IID2") ## store in a vector
vcf_counts_SNP_genecoords(vcf_data,df_snptestpos, df_genecoordstestpos,keep_indiv=ind_select)
## Filter individuals and filter genes for a VCF data
vcf_counts_SNP_genecoords(vcf_data,df_snptestpos,df_genecoordstestpos,keep_indiv=c("IID1","IID2"),filter_gene="GENE_1") #returns a matrix of data.table class
##For more examples refer manual
Citation
citation("GARCOM")
Dependencies (Imports)
data.table(>=v1.12.8)
stats
vcfR(>=v1.12.0)
suggests
testthat
Issues and suggestions
GARCOM welcomes suggestions and improvements. Please open issues on the github for bugs/suggestions.
Origin
GARCOM's derived from French word garçom (/ɡaʁ.sɔ̃/)
GARCOM is ready to serve and obtain desired results for the genetics data. We hope you enjoy it!
To-Dos
- Add support for Bgen format
- Add support for functional annotation (synonymous, non synonymous, etc.) in the VCF data
Version
Currently v1.2.0