MyNixOS website logo
Description

Get Gene Sets for Gene Enrichment Analysis.

Gene sets are fundamental for gene enrichment analysis. The package 'geneset' enables querying gene sets from public databases including 'GO' (Gene Ontology Consortium. (2004) <doi:10.1093/nar/gkh036>), 'KEGG' (Minoru et al. (2000) <doi:10.1093/nar/28.1.27>), 'WikiPathway' (Marvin et al. (2020) <doi:10.1093/nar/gkaa1024>), 'MsigDb' (Arthur et al. (2015) <doi:10.1016/j.cels.2015.12.004>), 'Reactome' (David et al. (2011) <doi:10.1093/nar/gkq1018>), 'MeSH' (Ish et al. (2014) <doi:10.4103/0019-5413.139827>), 'DisGeNET' (Janet et al. (2017) <doi:10.1093/nar/gkw943>), 'Disease Ontology' (Lynn et al. (2011) <doi:10.1093/nar/gkr972>), 'Network of Cancer Genes' (Dimitra et al. (2019) <doi:10.1186/s13059-018-1612-0>) and 'COVID-19' (Maxim et al. (2020) <doi:10.21203/rs.3.rs-28582/v1>). Gene sets are stored in the list object which provides data frame of 'geneset' and 'geneset_name'. The 'geneset' has two columns of term ID and gene ID. The 'geneset_name' has two columns of terms ID and term description.

Geneset

CRANstatus lifecycle

Alt

Overview

Omic-age brings huge amoung of gene data, which bring a problem of how to uncover their potential biological effects. One effective way is gene enrichment analysis.

Inside gene enrichment analysis, the central and fundamental part is the access of gene sets, no matter of traditional Over-representation analysis (ORA) method or advanced Functional class scoring (FCS) method (e.g. Gene Set Enrichment Analysis (GSEA) ).

Currently, many available enrichment analysis tools provide built-in data sets for few model species or ask users to download online. This causes a problem that user needs to download different gene sets from various public database for non-model species. For example, enrichGO() and gseGO() of clusterProfiler utilized organism-level annotation package for about 20 species. If research target is not listed in these organisms, user needs to build one via AnnotationHub or download from biomaRt or Blast2GO, which is time-comsuming and hard task for biologists without programming skills.

Here, we develop an R package name "geneset", aimming at accessing for updated gene sets with less time.

It includes GO (BP, CC and MF), KEGG (pathway, module, enzyme, network, drug and disease), WikiPathway, MsigDb, EnrichrDb, Reactome, MeSH, DisGeNET, Disease Ontology (DO), Network of Cancer Gene (NCG) (version 6 and v7) and COVID-19. Besides, it supports both model and non-model species.

Supported organisms

For more details, please refer to this site.

  • GO supports 143 species
  • KEGG supports 8213 species
  • MeSH supports 71 species
  • MsigDb supports 20 species
  • WikiPahtwaysupports 16 species
  • Reactome supports 11 species
  • EnrichrDB supports 5 species
  • Disease-related only support human (DO, NCG, DisGeNET and COVID-19)

About the data

All gene sets are stored on our website and could be easily accessed with simple functions.

We will follow a monthly-update frequency to make better user experience.

🛠 Installation

Install stable version from CRAN:

install.packages("geneset")

Install development version from GitHub:

remotes::install_github("GangLiLab/geneset")

Install development version from Gitee (for CHN mainland users):

remotes::install_git("https://gitee.com/genekitr/pacakge_geneset")

📚 Usage

For more details, please refer to genekitr book.

The package now includes eight functions: getGO(), getKEGG() , getMesh(), getMsigdb(), getWiki(), getReactome(), getEnrichrdb(), getHgDisease()

All functions take org (organism) as input. Several functions have unique argument such as ont (ontology) of genGO().

Take Human GO MF gene sets for example:

library(geneset)
x = getGO(org = "human",ont = "mf")

str(x)
# List of 4
# $ geneset     :'data.frame':	280115 obs. of  2 variables:
#   ..$ mf  : chr [1:280115] "GO:0000009" "GO:0000009" "GO:0000010" "GO:0000010" ...
# ..$ gene: chr [1:280115] "PIGV" "ALG12" "PDSS1" "PDSS2" ...
# $ geneset_name:'data.frame':	4878 obs. of  2 variables:
#   ..$ go_id: chr [1:4878] "GO:0000009" "GO:0000010" "GO:0000014" "GO:0000016" ...
# ..$ Term : chr [1:4878] "alpha-1,6-mannosyltransferase activity" "trans-hexaprenyltranstransferase activity" "single-stranded DNA endodeoxyribonuclease activity" "lactase activity" ...
# $ organism    : chr "hsapiens"
# $ type        : chr "mf"

head(x$geneset)
# mf  gene
# GO:0000009  PIGV
# GO:0000009 ALG12
# GO:0000010 PDSS1
# GO:0000010 PDSS2
# GO:0000014 ENDOG
# GO:0000014 ERCC1

head(x$geneset_name)
# go_id                                               Term
# GO:0000009             alpha-1,6-mannosyltransferase activity
# GO:0000010          trans-hexaprenyltranstransferase activity
# GO:0000014 single-stranded DNA endodeoxyribonuclease activity
# GO:0000016                                   lactase activity
# GO:0000026             alpha-1,2-mannosyltransferase activity
# GO:0000030                       mannosyltransferase activity
How many terms/pathways in specific gene set?

Take human KEGG Pathway as an example:

gs <- geneset::getKEGG('hsa','pathway')
gs_df <- gs$geneset
table(gs_df$id) %>% length()
# 347
Pass gene set to GSVA/ssGSEA
library(GSVA)
# firstly: turn gs to list
gs_list <- split(gs_df$gene, gs_df$id)  

# secondly: pass your expression dataset: "express_data" to gsva() function
ssgsea_mat <- gsva(expr=express_data, 
                 method="ssgsea", # "gsva"(default), "zscore", "plage"
                 gset.idx.list=gs_list,  
                 verbose=F, 
                 parallel.sz = 4 )
Pass gene set to ORA/GSEA
hg_gs <- geneset::getGO(org = "human",ont = "mf")
# ORA
go_ent <- genekitr::genORA(input_id, geneset = hg_gs)
# GSEA (input is a pre-ranked gene list with logFC value)
gse <- genGSEA(genelist = geneList, geneset = hg_gs)

✍️ Author

Yunze Liu.

Metadata

Version

0.2.7

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows