Calculate the GSEA score of gene sets at the single-cell level using the 'AUCell' package:

Aibar et al. (2017) SCENIC: Single-cell regulatory network inference and clustering. Nature Methods. doi: 10.1038/nmeth.4463

Aibar et al. (2016) AUCell: Analysis of 'gene set' activity in single-cell RNA-seq data. R/Bioconductor package.

This function can utilize either customized gene sets or the pre-built GO or Reactome database.

GeneSetAnalysis(
  seu = NULL,
  genesets,
  title = "genesets",
  ratio = 0.4,
  n.min = 1,
  n.max = Inf,
  slot = "counts",
  assay = "RNA",
  nCores = 1,
  aucMaxRank = NULL,
  export_to_matrix = F,
  verbose = TRUE,
  n.items.part = NULL
)

GeneSetAnalysisGO(
  seu = NULL,
  dataset = NULL,
  parent = NULL,
  root = "BP",
  spe = getOption("spe"),
  ratio = 0.4,
  n.min = 1,
  n.max = Inf,
  only.end.terms = F,
  slot = "counts",
  assay = "RNA",
  nCores = 1,
  aucMaxRank = NULL,
  title = NULL,
  export_to_matrix = F,
  verbose = TRUE,
  n.items.part = NULL
)

GeneSetAnalysisReactome(
  seu = NULL,
  parent = "All",
  spe = getOption("spe"),
  ratio = 0.4,
  n.min = 1,
  n.max = Inf,
  only.end.terms = F,
  slot = "counts",
  assay = "RNA",
  nCores = 1,
  aucMaxRank = NULL,
  title = NULL,
  export_to_matrix = F,
  verbose = TRUE,
  n.items.part = NULL
)

Arguments

seu

Seurat object.

genesets

List of gene sets (or signatures) to test in the cells. The gene sets should be provided as a character list.

title

Name of the slot where the data will be saved.

ratio

Minimum ratio of genes in the gene set detected in the datasets. Default: 0.4.

n.min

Minimum number of genes in the gene set. Default: 1.

n.max

Maximum number of genes in the gene set. Default: Inf.

slot

Slot from which to pull feature data. Default: 'counts'.

assay

Name of the assay to use. Default: 'RNA'.

nCores

Number of cores to use for computation. Default: 1.

aucMaxRank

Threshold for calculating the AUC.

export_to_matrix

If TRUE, the function will return an AUCell matrix instead of a Seurat object.

verbose

Should the function display progress messages? Default: TRUE.

n.items.part

If the datasets/gene sets are too large, the function can split the gene sets into n parts to reduce RAM usage.

dataset

(GO) Alias for 'parent'. Default: NULL.

parent

ID or name of the parent (top-level) gene set in the GO/Reactome database. This restricts the analysis to a subset of gene sets. Default: NULL.

root

(GO) Specifies which root category to use: 'BP' (Biological Process), 'MF' (Molecular Function), or 'CC' (Cellular Component). Default: 'BP'.

spe

Species, either 'human' or 'mouse'. Default: getOption("spe").

only.end.terms

Boolean indicating whether only the end-level pathways or gene sets should be used for calculation. Default: FALSE.

Value

Seurat object or matrix.

Details

If returning a Seurat object, the AUCell matrix is saved in seu@misc[["AUCell"]][[title]].

Examples

library(SeuratExtend)
options(spe = "human")

# Perform GSEA using the Gene Ontology (GO) database. Given the extensive size of the
# entire database, this example only evaluates pathways under the "immune_system_process"
# category. The results will be saved in: seu@misc$AUCell$GO[[title]]
pbmc <- GeneSetAnalysisGO(pbmc, parent = "immune_system_process")
matr <- pbmc@misc$AUCell$GO$immune_system_process
matr <- RenameGO(matr)
head(matr, 4:3)

# For the "parent" argument, you can use any term from the GO database, be it a GO ID or
# pathway name. Using `GeneSetAnalysisGO()` without arguments will show commonly used
# GO categories:
GeneSetAnalysisGO()

# To visualize the data, consider using a heatmap (to compare multiple groups with more
# features, albeit with less detailed representation), a violin plot (to compare multiple
# groups with fewer features, but presenting more details for individual data points), or a
# waterfall plot (to contrast only two groups). Below is an example of a heatmap:
Heatmap(CalcStats(matr, f = pbmc$cluster, order = "p", n = 4), lab_fill = "zscore")

# Violin plot example:
VlnPlot2(matr[1:3,], f = pbmc$cluster)

# Waterfall plot example:
WaterfallPlot(matr, f = pbmc$cluster, ident.1 = "B cell", ident.2 = "CD8 T cell", top.n = 20)

# Conduct GSEA using the Reactome database. This example will only assess pathways under
# the "Immune System" category. Results will be stored in: seu@misc$AUCell$Reactome[[title]]
pbmc <- GeneSetAnalysisReactome(pbmc, parent = "Immune System")
matr <- pbmc@misc$AUCell$Reactome$`Immune System`
matr <- RenameReactome(matr)
Heatmap(CalcStats(matr, f = pbmc$cluster, order = "p", n = 4), lab_fill = "zscore")

# As with GO, you can run `GeneSetAnalysisReactome()` without arguments to view
# commonly used categories in the Reactome database:
GeneSetAnalysisReactome()

# For GSEA using custom gene sets, the output AUCell matrix will be saved under:
# seu@misc$AUCell[[title]]
pbmc <- GeneSetAnalysis(pbmc, genesets = hall50$human)
matr <- pbmc@misc$AUCell$genesets
Heatmap(CalcStats(matr, f = pbmc$cluster), lab_fill = "zscore")