Convert Seurat Object to AnnData and Generate scVelo Plots for Single-Cell RNA Velocity Analysis

This set of functions converts a Seurat object and associated Velocyto loom file(s) into an AnnData object and generates visualization plots for RNA velocity analysis using scVelo. The AnnData object can be directly read from a file or accessed from memory to produce various styles of plots. This integrated approach facilitates the use of scVelo for trajectory analysis in Python's Scanpy library, allowing seamless transition between data processing in R and trajectory analysis in Python.

scVelo.SeuratToAnndata(
  seu,
  filename,
  velocyto.loompath,
  cell.id.match.table = NULL,
  prefix = NULL,
  postfix = "-1",
  remove_duplicates = FALSE,
  conda_env = "seuratextend"
)

scVelo.Plot(
  load.adata = NULL,
  style = c("stream", "grid", "scatter"),
  basis = "umap_cell_embeddings",
  color = NULL,
  groups = NULL,
  palette = NULL,
  alpha = 0.15,
  arrow_size = 3,
  arrow_length = 2,
  dpi = 300,
  legend_fontsize = 9,
  figsize = c(7, 5),
  xlim = NULL,
  ylim = NULL,
  save = NULL,
  conda_env = "seuratextend"
)

Arguments

seu: The Seurat object containing single-cell RNA sequencing data that needs to be analyzed using scVelo.
filename: Path where the resulting AnnData object will be saved. This should be a path to an h5ad file.
velocyto.loompath: Path(s) to the Velocyto-generated loom file which contains RNA velocity data.
cell.id.match.table: An optional data frame for advanced users that maps cell IDs between the Seurat object and Velocyto loom file across multiple samples. It requires a strict format with three columns: cellid.seurat, cellid.velocyto, and velocyto.loompath, indicating the cell ID in the Seurat object, the corresponding cell ID in the Velocyto loom, and the loom file path for that sample, respectively. Default: NULL
prefix: Prefix used to prepend to cell IDs in the Seurat object to match the corresponding IDs in the Velocyto loom file, reflecting sample or batch identifiers. Default: NULL
postfix: Postfix appended to cell IDs in the Seurat object to match the corresponding IDs in the Velocyto loom file. Default: '-1'
remove_duplicates: Logical flag indicating whether to remove duplicate cells in the AnnData object. If TRUE, duplicate cells are removed based on PCA and sum of gene expression values. Default: FALSE
conda_env: Name of the Conda environment where the Python dependencies for scVelo and Scanpy are installed. This environment is used to run Python code from R. Default: 'seuratextend'
load.adata: Path to a previously saved AnnData object (in h5ad format) which can be directly loaded to avoid re-running preprocessing. If NULL, reticulate will automatically use the existing AnnData object `adata` in the Python environment for plotting. Default: NULL.
style: Style of the velocity plot, allowing for different visual representations such as 'stream', 'grid', or 'scatter'. Default: c("stream", "grid", "scatter").
basis: The embedding to be used for plotting, typically 'umap_cell_embeddings' to represent UMAP reductions. Default: 'umap_cell_embeddings'.
color: The variable by which to color the plot, usually a categorical variable like cluster identifiers or a continuous variable reflecting gene expression levels. Default: NULL.
groups: Groups or clusters to highlight in the plot, useful for focusing on specific cell types or conditions within the dataset. Default: NULL.
palette: Color palette to use for differentiating between groups or clusters within the plot. Allows customization of aesthetic presentation. Default: NULL.
alpha: Opacity of the points in the plot, which can be adjusted to enhance visualization when dealing with densely packed points. Default: 0.15.
arrow_size: Size of the arrows representing RNA velocity vectors in the plot, relevant only when `style` is set to 'scatter'. This can be adjusted to make the arrows more or less prominent based on visualization needs. Default: 3.
arrow_length: Length of the arrows, which affects how far the arrows extend from their origin points. Relevant only when style is 'scatter', helping in interpreting the directionality and magnitude of cellular transitions. Default: 2.
dpi: Resolution of the saved plot, useful when preparing figures for publication or presentations. Default: 300.
legend_fontsize: Size of the font used in the plot legend, allowing for customization based on the figure's intended use or audience. Default: 9.
figsize: Dimensions of the plot in inches, providing control over the size of the output figure to accommodate different analysis contexts. Default: c(7, 5).
xlim: Limits for the x-axis, which can be set to focus on specific areas of the plot or to standardize across multiple plots. Default: NULL.
ylim: Limits for the y-axis, similar in use to `xlim` for focusing or standardizing the y-axis view. Default: NULL.
save: Path where the plot should be saved. If specified, the plot will be saved to the given location. Supports various file formats like PNG, PDF, SVG, etc. Default: NULL.

Value

If remove_duplicates = TRUE, returns the filtered Seurat object with duplicate cells removed. Otherwise, does not return any object within R; instead, prepares and stores an AnnData object `adata` in the Python environment accessible via `reticulate`, and generates plots which can be viewed directly or saved to a file. The plots reflect the dynamics of RNA velocity in single-cell datasets.

Details

This integrated functionality facilitates a seamless transition between converting Seurat objects to AnnData objects and plotting with scVelo. The primary metadata and dimension reduction data from the Seurat object are used to prepare the AnnData object, which is then utilized for generating plots. `SeuratExtend` enhances scVelo plotting capabilities in R, supporting a variety of customization options for visualizing single-cell RNA velocity data. Users can manipulate plot styles, color schemes, group highlights, and more, making it an essential tool for advanced single-cell analysis without the need for direct interaction with Python code.

## macOS-Specific Considerations

If you are using macOS, be aware of the following:

* **Intel Macs**: Using these functions in R Markdown within RStudio may cause the R session to crash. Use regular .R script files instead.

* **Apple Silicon (M1/M2/M3/M4)**: There are known memory management issues between R and Python when performing operations like PCA on AnnData objects. To avoid crashes, start with a fresh R session and call `activate_python()` before loading any R objects or running any scVelo functions:

```r # Run at the beginning of your session on macOS activate_python()

# Then load your data and proceed with analysis seu <- readRDS("path/to/seurat_object.rds") scVelo.SeuratToAnndata(...) ```

Examples

library(Seurat)
library(SeuratExtend)

# Download the example Seurat Object
mye_small <- readRDS(url("https://zenodo.org/records/10944066/files/pbmc10k_mye_small_velocyto.rds", "rb"))

# Download the example velocyto loom file to tmp folder
loom_path <- file.path(tempdir(), "pbmc10k_mye_small.loom")
download.file("https://zenodo.org/records/10944066/files/pbmc10k_mye_small.loom", loom_path)

# Set up the path for saving the AnnData object in the HDF5 (h5ad) format
adata_path <- file.path(tempdir(), "mye_small.h5ad")

# Integrate Seurat Object and velocyto loom information into one AnnData object, which will be stored at the specified path.
scVelo.SeuratToAnndata(
  mye_small, # The downloaded example Seurat object
  filename = adata_path, # Path where the AnnData object will be saved
  velocyto.loompath = loom_path, # Path to the loom file
  prefix = "sample1_", # Prefix for cell IDs in the Seurat object
  postfix = "-1" # Postfix for cell IDs in the Seurat object
)

# Generate a default UMAP plot colored by 'cluster' and save it as a PNG file
scVelo.Plot(color = "cluster", save = "umap1.png", figsize = c(5,4))

# Generate a scatter style plot highlighting specific groups, using a custom color palette, with specified axis limits, and save it to a file
scVelo.Plot(
  style = "scatter",
  color = "cluster",
  groups = c("DC", "Mono CD14"),
  palette = color_pro(3, "light"),
  xlim = c(0, 10), ylim = c(0, 10),
  save = "umap2_specified_area.png"
)