Aligning Ontology Annotation Across Single Cell Datasets with 'scOntoMatch'.
scOntoMatch
scOntoMatch is an R package which unifies ontology annotation of scRNA-seq datasets to make them comparable across studies.
Author&Maintainer: Yuyao Song [email protected]
Large-scale single cell atlases often have curated annotations using a standard ontology system. This package aims to match ontology labels between different datasets to make them comparable across studies.
Installation
## from source
install.packages("devtools")
devtools::install_github("Papatheodorou-Group/scOntoMatch")
library(scOntoMatch)
library(ontologyIndex)
Usage
First download the ontology .obo
file from the OBO foundry. Common ontologies include:
- Cell ontology
- Uberon multi-species anatomy ontology
- Drosophila gross anatomy (FBbt)
- Zebrafish anatomy and development ontology (ZFA)
Refer to vignette for detailed usage.
Get input ready
adatas = getAdatas(metadata = metadata, sep = "\t")
ont = ontologyIndex::get_OBO(oboFile, propagate_relationships = c('is_a', 'part_of'), )
Trim ontology tree to remove redundant terms
adatas_minimal = ontoMultiMinimal(adatas, ont = ont, anno_col = "cell_ontology_type", onto_id_col = "cell_ontology_id")
Match ontology cross datasets by direct mapping and mapping descendants to ancestor terms.
adatas_matched = ontoMultiMatch(adatas_minimal, ont = ont, anno_col = "cell_ontology_base")
This package also provides convenient plotting functions to help comprehend the hierarchy of cell types in any single cell dataset.
Plot a ontology tree per dataset
plotOntoTree(ont, onts, plot_ancestors=TRUE, ont_query, fontsize = 20)
Plot a matched ontology tree for all datasets
# use 'animal cells' as root
plts = plotMatchedOntoTree(ont = ont, adatas = adatas, anno_col = "cell_ontology_mapped", roots = 'CL:0000548', fontsize=25)
The Single Cell Expression Atlas hosted at EBI provides uniformly analysed and annotated scRNA-Seq data across multiple species. Datasets with curated ontology labels are all great inputs to this package. scRNA-seq data stored as h5ad files can be downloaded via the ftp site. Files that has extension .project.h5ad
can be pass to ontoMatch
with anno_col = 'authors_cell_type_-_ontology_labels'
.
This package imports functions from ontologyIndex, ontologyPlot and anndata.