Assessing Predisposition Between Phenotypes using Polygenic Scores.
comorbidPGS
comorbidPGS is a tool for analysing an already computed Polygenic Score (PGS, also named PRS/GRS for binary outcomes) distribution to investigate shared genetic aetiology in multiple conditions.
comorbidPGS is under GPL-3 license, and is freely available for download.
Prerequisite
- R version 3.5 or higher with the following packages:
- stats
- utils
- ggplot2
Installation
You can install the development version of comorbidPGS from GitHub with:
# install.packages("devtools")
devtools::install_github("VP-biostat/comorbidPGS")
Example
Building an Association Table
This is a basic example which shows you how to do basic association with the example dataset:
library(comorbidPGS)
#>
#> Attachement du package : 'comorbidPGS'
#> L'objet suivant est masqué depuis 'package:graphics':
#>
#> assocplot
# use the demo dataset
dataset <- comorbidData
# NOTE: The dataset must have at least 3 different columns:
# - an ID column (the first one)
# - a PGS column (must be numeric, by default it is the column named "SCORESUM" or the second column if "SCORESUM" is not present)
# - a Phenotype column, can be factors, numbers or characters
# do an association of one PGS with one Phenotype
result_1 <- assoc(dataset, prs_col = "t2d_PGS", phenotype_col = "t2d")
PGS | Phenotype | Phenotype_type | Statistical_method | Covar | N_cases | N_controls | N | Effect | SE | lower_CI | upper_CI | P_value |
---|---|---|---|---|---|---|---|---|---|---|---|---|
t2d_PGS | t2d | Cases/Controls | Binary logistic regression | NA | 730 | 9270 | 10000 | 1.688258 | NA | 1.561821 | 1.824931 | 0 |
# do multiple associations
assoc <- expand.grid(c("t2d_PGS", "ldl_PGS"), c("ethnicity","brc","t2d","log_ldl","sbp_cat"))
result_2 <- multiassoc(df = dataset, assoc_table = assoc, covar = c("age", "sex", "gen_array"))
#> Warning in phenotype_type(df = df, phenotype_col = phenotype_col): Phenotype
#> column log_ldl is continuous and not normal, please normalise prior association
#> Warning in phenotype_type(df = df, phenotype_col = phenotype_col): Phenotype
#> column log_ldl is continuous and not normal, please normalise prior association
PGS | Phenotype | Phenotype_type | Statistical_method | Covar | N_cases | N_controls | N | Effect | SE | lower_CI | upper_CI | P_value | |
---|---|---|---|---|---|---|---|---|---|---|---|---|---|
2 | t2d_PGS | ethnicity 1 ~ 2 | Categorical | Multinomial logistic regression | age+sex+gen_array | 2142 | 6381 | 8523 | 0.9814174 | NA | 0.9345150 | 1.0306739 | 0.4528020 |
3 | t2d_PGS | ethnicity 1 ~ 3 | Categorical | Multinomial logistic regression | age+sex+gen_array | 1205 | 6381 | 7586 | 1.0178971 | NA | 0.9570931 | 1.0825640 | 0.5724292 |
4 | t2d_PGS | ethnicity 1 ~ 4 | Categorical | Multinomial logistic regression | age+sex+gen_array | 272 | 6381 | 6653 | 0.9434640 | NA | 0.8355980 | 1.0652542 | 0.3474694 |
21 | ldl_PGS | ethnicity 1 ~ 2 | Categorical | Multinomial logistic regression | age+sex+gen_array | 2142 | 6381 | 8523 | 0.9925623 | NA | 0.9451678 | 1.0423334 | 0.7648927 |
31 | ldl_PGS | ethnicity 1 ~ 3 | Categorical | Multinomial logistic regression | age+sex+gen_array | 1205 | 6381 | 7586 | 1.0083869 | NA | 0.9481215 | 1.0724830 | 0.7905175 |
41 | ldl_PGS | ethnicity 1 ~ 4 | Categorical | Multinomial logistic regression | age+sex+gen_array | 272 | 6381 | 6653 | 0.9760204 | NA | 0.8647226 | 1.1016433 | 0.6943783 |
1 | t2d_PGS | brc | Cases/Controls | Binary logistic regression | age+sex+gen_array | 402 | 5041 | 5443 | 1.0061678 | NA | 0.9087543 | 1.1140235 | 0.9057882 |
11 | ldl_PGS | brc | Cases/Controls | Binary logistic regression | age+sex+gen_array | 402 | 5041 | 5443 | 1.1037106 | NA | 0.9956370 | 1.2235153 | 0.0605407 |
12 | t2d_PGS | t2d | Cases/Controls | Binary logistic regression | age+sex+gen_array | 730 | 9270 | 10000 | 1.7359738 | NA | 1.6029867 | 1.8799938 | 0.0000000 |
13 | ldl_PGS | t2d | Cases/Controls | Binary logistic regression | age+sex+gen_array | 730 | 9270 | 10000 | 0.9823272 | NA | 0.9102411 | 1.0601223 | 0.6465580 |
14 | t2d_PGS | log_ldl | Continuous | Linear regression | age+sex+gen_array | NA | NA | 10000 | 0.0059961 | 0.0022747 | 0.0015378 | 0.0104544 | 0.0084010 |
15 | ldl_PGS | log_ldl | Continuous | Linear regression | age+sex+gen_array | NA | NA | 10000 | 0.0828545 | 0.0021183 | 0.0787027 | 0.0870064 | 0.0000000 |
16 | t2d_PGS | sbp_cat | Ordered Categorical | Ordinal logistic regression | age+sex+gen_array | NA | NA | 10000 | 1.0628744 | NA | 1.0236044 | 1.1036509 | 0.0015002 |
17 | ldl_PGS | sbp_cat | Ordered Categorical | Ordinal logistic regression | age+sex+gen_array | NA | NA | 10000 | 1.0078855 | NA | 0.9707330 | 1.0464598 | 0.6818849 |
Examples of plot
densityplot(dataset, prs_col = "ldl_PGS", phenotype_col = "sbp_cat")
# show multiple associations in a plot
assoplot <- assocplot(score_table = result_2)
assoplot$continuous_phenotype
assoplot$discrete_phenotype
NOTE: The score_table should have the assoc() output format
centileplot(dataset, prs_col = "brc_PGS", phenotype_col = "brc")
#> Warning in centileplot(dataset, prs_col = "brc_PGS", phenotype_col = "brc"):
#> The dataset has less than 10,000 individuals, centiles plot may not look good!
#> Use the argument decile = T to adapt to small datasets
As those graphical functions use ggplot2, you can fully customize your plot:
library(ggplot2)
centileplot(dataset, prs_col = "t2d_PGS", phenotype_col = "t2d") +
scale_color_gradient(low = "green", high = "red")
decileboxplot(dataset, prs_col = "ldl_PGS", phenotype_col = "ldl")
Citation
If you use comorbidPGS in any published work, please cite the following manuscript:
Pascat V (????). comorbidPGS: Assessing the shared predisposition between Phenotypes using Polygenic Scores (PGS, or PRS/GRS for binary outcomes). R package version 0.3.9000.