MyNixOS website logo
Description

Cell Type Identification and Discovery from Single Cell Gene Expression Data.

An implementation of neural networks trained with flow-sorted gene expression data to classify cellular phenotypes in single cell RNA-sequencing data. See Chamberlain M et al. (2021) <doi:10.1101/2021.02.01.429207> for more details.

Contributors Forks Stargazers Issues GPL License


Logo

SignacX 2.2.3

Get the most out of your single cell data.
Explore the docs »

View Demo · Report Bug · Request Feature

Table of Contents

What is SignacX?

SignacX is software developed by the Savova lab at Sanofi with a focus on single cell genomics for clinical applications. SignacX classifies the cellular phenotype for each individual cell in single cell RNA-sequencing data using neural networks trained with sorted bulk gene expression data from the Human Primary Cell Atlas. In this R implementation, we provide functions and vignettes that demonstrate how to: integrate single cell data (mapping cells from one data set to another), classify non-human data, identify novel cell types, and classify single cell data across many tissues, diseases and technologies. To learn more, check out the pre-print here.

Data portal

Here, we provide interactive access to data from the pre-print with SPRING Viewer. Just click the "Explore" links below, and search your favorite gene:

LinksTissueDiseaseNumber of cellsNumber of samplesSourceSignac version
ExploreKidneyCancer48,03747Stewart et al. 2019v2.0.7
ExploreKidney and urineLupus nephritis and healthy5,88639Arazi et al. 2019v2.0.7
ExploreLungCancer42,84418Zilionis et al. 2020v2.0.7
ExploreLungFibrosis96,46131Habermann et al. 2020v2.0.7
ExploreLungFibrosis109,42116Reyfman et al. 2019v2.0.7
ExploreMonkey PBMCsHealthy5,4911Chamberlain et al. 2021v2.0.7
ExploreMonkey PBMCsHealthy5,2201Chamberlain et al. 2021v2.0.7
ExploreMonkey T cellsHealthy5,4961Chamberlain et al. 2021v2.0.7
ExplorePBMCsCancer14,0488Zilionis et al. 2020v2.0.7
ExplorePBMCsHealthy7,902110X Genomicsv2.0.7
ExplorePBMCsHealthy4,784110X Genomicsv2.0.7
ExploreSkinAtopic dermatitis36,69017He et al. 2020v2.0.7
ExploreSynoviumRheumatoid arthritis and osteoarthritis8,92026Zhang et. al 2019v2.0.7

Note:

  • Cell type annotations are provided at four levels (immune, celltypes, cellstates and novel celltypes).
  • When available, we also provided information about sample covariates (i.e., disease, age, gender, FACs etc.).
  • Cell type annotations for all 13 data sets were generated with the Signac function with the default settings without changing any settings or parameters.

Special thanks to Allon Klein's lab (particularly Caleb Weinreb and Sam Wolock) for hosting the data.

Getting Started

To install SignacX in R, simply do:

Installation

install.packages("SignacX")

Quick start

The main functions in Signac are:

# load the library
library(SignacX)

# Generate initial labels
labels = Signac(E = your_data_here)

# Get cell type labels
celltypes = GenerateLabels(labels, E = your_data_here)

Sometimes we don't have time to run Signac, and need a quick solution. Although Signac scales fine with large data sets (>300,000 cells), we developed SignacFast to quickly classify single cell data:

# load the library
library(SignacX)

# generate labels with pre-trained model
labels_fast <- SignacFast(E = your_data_here, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = your_data_here)

Usage

To make life easier, SignacX was integrated with Seurat (versions 3 and 4), and with SPRING. We provide a few vignettes:

SPRING

In the pre-print, we often used Signac integrated with SPRING. To reproduce our findings and to generate new results with SPRING, please visit the SPRING repository which has example notebooks and installation instructions, particularly for processing CITE-seq and scRNA-seq data from 10X Genomics. Briefly, Signac is integrated seamlessly with the output files of SPRING in R, requiring only a few functions:

# load the Signac library
library(SignacX)

# dir points to the "FullDataset_v1" directory generated by the SPRING Jupyter notebook
dir = "./FullDataset_v1" 

# load the expression data
E = CID.LoadData(dir)

# generate cellular phenotype labels
labels = Signac(E, spring.dir = dir)
celltypes = GenerateLabels(labels, E = E, spring.dir = dir)

# write cell types and Louvain clusters to SPRING
dat <- CID.writeJSON(celltypes, spring.dir = dir)

After running the above functions, cellular phenotypes and Louvain clusters are ready to be visualized with SPRING Viewer, which can be setup locally as described here.

Seurat

Another way to use Signac is with Seurat. In this vignette, we performed multi-modal analysis of CITE-seq PBMCs from 10X Genomics using Signac integrated with Seurat.

Note:

  • This same data set was also processed using SPRING in this notebook, and subsequently classified with Signac, which was used to generate SPRING layouts for these data in the pre-print (Figures 2-4), which is available for interactive exploration here.

MASC

Sometimes, we have single cell genomics data with disease information, and we want to know which cellular phenotypes are enriched for disease. In this vignette, we applied Signac to classify cellular phenotypes in healthy and lupus nephritis kidney cells, and then we used MASC to identify which cellular phenotypes were disease-enriched.

Note:

  • MASC typically requires equal numbers of cells and samples between case and control: an unequal number might skew the clustering of cells towards one sample (i.e., a "batch effect"), which could cause spurious disease enrichment in the mixed effect model. Since Signac classifies each cell independently (without using clusters), Signac annotations can be used with MASC without a priori balancing samples or cells, unlike cluster-based annotation methods.

Non-human data

In Supplemental Figure 8 of the pre-print, we classified single cell data for a model organism (cynomolgus monkey) for which flow-sorted datasets were generally lacking without any additional species-specific training. Instead, we mapped homologous genes from the Macaca fascicularis genome to the human genome in the single cell data, and then performed cell type classification with Signac. We demonstrate how we mapped the gene symbols here.

Note:

  • This code can be used for to identify homologous genes between any two species.
  • Monkey data used in Supplemental Figure 8 are available for interactive exploration in the table listed above.

Genes of interest

In Figure 6 of the pre-print, we compiled data from three source (CellPhoneDB, GWAS catalog and Fang et al. 2020) to find genes of immunological / pharmacological interest. These genes and their annotations can be accessed internally from within Signac:

# load the library
library(SignacX)

# See ?Genes_Of_Interest
data("Genes_Of_Interest")

Learning from single cell data

In Figure 4 of the pre-print, we demonstrated that Signac mapped cell type labels from one single cell data set to another; learning CD56bright NK cells from CITE-seq data. Here, we provide a vignette for reproducing this analysis, which can be used to map cell populations (or clusters of cells) from one data set to another. We also provide interactive access to the single cell data that were annotated with the CD56bright NK cell-model (Note: the CD56bright NK cells appear in the "CellStates" annotation layer as red cells).

LinksTissueDiseaseNumber of cellsNumber of samplesSourceSignac version
ExploreKidneyCancer48,03747Stewart et al. 2019v2.0.7 + CD56bright NK
ExploreKidney and urineLupus nephritis and healthy5,88639Arazi et al. 2019v2.0.7 + CD56bright NK
ExploreLungCancer42,84418Zilionis et al. 2020v2.0.7 + CD56bright NK
ExploreLungFibrosis96,46131Habermann et al. 2020v2.0.7 + CD56bright NK
ExploreLungFibrosis109,42116Reyfman et al. 2019v2.0.7 + CD56bright NK
ExploreMonkey PBMCsHealthy5,4911Chamberlain et al. 2021v2.0.7 + CD56bright NK
ExploreMonkey PBMCsHealthy5,2201Chamberlain et al. 2021v2.0.7 + CD56bright NK
ExploreMonkey T cellsHealthy5,4961Chamberlain et al. 2021v2.0.7 + CD56bright NK
ExplorePBMCsCancer14,0488Zilionis et al. 2020v2.0.7 + CD56bright NK
ExplorePBMCsHealthy4,784110X Genomicsv2.0.7 + CD56bright NK
ExploreSkinAtopic dermatitis36,69017He et al. 2020v2.0.7 + CD56bright NK
ExploreSynoviumRheumatoid arthritis and osteoarthritis8,92026Zhang et. al 2019v2.0.7 + CD56bright NK

Fast Signac

Sometimes we don't have time to run Signac and need a faster solution. Although Signac scales fine with large data sets (>300,000 cells) and even for large data, typically takes less than an hour, we developed SignacFast to quickly classify single cell data:

# load the library
library(SignacX)

# generate labels with pre-trained model
labels_fast <- SignacFast(E = your_data_here, num.cores = 4)
celltypes_fast = GenerateLabels(labels_fast, E = your_data_here)

Unlike Signac, SignacFast uses a pre-trained ensemble of neural network models generated from the HPCA reference data, speeding classsification time ~5-10x fold. These models were generated from the HPCA training data like so:

# load the library
library(SignacX)

# load pre-trained neural network ensemble model
ref = GetTrainingData_HPCA()

# generate models
Models_HPCA = ModelGenerator(R = training_HPCA, N = 100, num.cores = 4)

The "Models_HPCA" are accessed from within the R package:

# load the library
library(SignacX)

# load pre-trained neural network ensemble model
Models = GetModels_HPCA()

We demonstrate how to use SignacFast in this vignette, which shows that the results are broadly consistent with running Signac.

Note:

  • For proper use; if the concern is only major cell types (i.e., TNK and MPh cells), then SignacFast is a fine alternative to Signac.

Benchmarking

CITE-seq

In Figure 2-3 of the pre-print, we validated Signac with CITE-seq PBMCs. Here, we reproduced that analysis with SPRING (in this vignette; as was performed in the pre-print) and additionally with Seurat (in this vignette), and provide interactive access to the data here.

Flow-sorted synovial cells

In Figure 3 of the pre-print, we validated Signac with flow cytometry and compared Signac to SingleR. We reproduced that analysis using Seurat in this vignette, and provide interactive access to the data here.

PBMCs

In Table 1 of the pre-print, we benchmarked Signac across seven different technologies: CEL-seq, Drop-Seq, inDrop, 10X (v2), 10X (v3), Seq-Well and Smart-Seq2; this analysis was reproduced here.

Roadmap

See the open issues for a list of proposed features (and known issues).

Contributing

Any contributions you make are greatly appreciated.

  1. Fork the Project
  2. Create your Feature Branch (git checkout -b feature/AmazingFeature)
  3. Commit your Changes (git commit -m 'Add some AmazingFeature')
  4. Push to the Branch (git push origin feature/AmazingFeature)
  5. Open a Pull Request

You can also open a pull request to commit to the master branch.

License

Distributed under the GPL v3.0 License. See LICENSE for more information.

Contact

Mathew Chamberlain - [email protected]

Project Link: https://github.com/mathewchamberlain/SignacX.

Metadata

Version

2.2.5

License

Unknown

Platforms (77)

    Darwin
    FreeBSD
    Genode
    GHCJS
    Linux
    MMIXware
    NetBSD
    none
    OpenBSD
    Redox
    Solaris
    WASI
    Windows
Show all
  • aarch64-darwin
  • aarch64-freebsd
  • aarch64-genode
  • aarch64-linux
  • aarch64-netbsd
  • aarch64-none
  • aarch64-windows
  • aarch64_be-none
  • arm-none
  • armv5tel-linux
  • armv6l-linux
  • armv6l-netbsd
  • armv6l-none
  • armv7a-darwin
  • armv7a-linux
  • armv7a-netbsd
  • armv7l-linux
  • armv7l-netbsd
  • avr-none
  • i686-cygwin
  • i686-darwin
  • i686-freebsd
  • i686-genode
  • i686-linux
  • i686-netbsd
  • i686-none
  • i686-openbsd
  • i686-windows
  • javascript-ghcjs
  • loongarch64-linux
  • m68k-linux
  • m68k-netbsd
  • m68k-none
  • microblaze-linux
  • microblaze-none
  • microblazeel-linux
  • microblazeel-none
  • mips-linux
  • mips-none
  • mips64-linux
  • mips64-none
  • mips64el-linux
  • mipsel-linux
  • mipsel-netbsd
  • mmix-mmixware
  • msp430-none
  • or1k-none
  • powerpc-netbsd
  • powerpc-none
  • powerpc64-linux
  • powerpc64le-linux
  • powerpcle-none
  • riscv32-linux
  • riscv32-netbsd
  • riscv32-none
  • riscv64-linux
  • riscv64-netbsd
  • riscv64-none
  • rx-none
  • s390-linux
  • s390-none
  • s390x-linux
  • s390x-none
  • vc4-none
  • wasm32-wasi
  • wasm64-wasi
  • x86_64-cygwin
  • x86_64-darwin
  • x86_64-freebsd
  • x86_64-genode
  • x86_64-linux
  • x86_64-netbsd
  • x86_64-none
  • x86_64-openbsd
  • x86_64-redox
  • x86_64-solaris
  • x86_64-windows