How to Evaluate the Consistency of the Velocity Embedding

Tutorials

FlowMap embedding · Velocity embedding consistency

This tutorial uses the scVelo pancreas AnnData file. Download it manually from Figshare to keep the same cells and genes across runs:

https://figshare.com/articles/dataset/scvelo-pancreas_h5ad/16547649?file=30595479

Place the file at data/scvelo-pancreas.h5ad, or change DATA_PATH below.

Load Data

from pathlib import Path

import numpy as np
import pandas as pd
import scanpy as sc
import scvelo as scv
from scipy import sparse
from sklearn.decomposition import PCA
from sklearn.preprocessing import StandardScaler

DATA_PATH = Path("data/scvelo-pancreas.h5ad")
adata = sc.read_h5ad(DATA_PATH)

Recompute PCA

When the input dimension is high, such as thousands of genes, it is recommended to perform the embedding-level FlowMap fit in PC space.

def to_dense(x):
    return x.toarray() if sparse.issparse(x) else np.asarray(x)

hvg_mask = adata.var["highly_variable"].to_numpy(dtype=bool)
X_scaled = StandardScaler().fit_transform(to_dense(adata[:, hvg_mask].X))

pca = PCA(n_components=50, svd_solver="arpack", random_state=0)
adata.obsm["X_pca"] = pca.fit_transform(X_scaled)

pcs = np.zeros((adata.n_vars, 50))
pcs[hvg_mask] = pca.components_.T
adata.varm["PCs"] = pcs
adata.uns["pca"] = {
    "variance": pca.explained_variance_,
    "variance_ratio": pca.explained_variance_ratio_,
}

scv.tl.velocity(adata, vkey="stochastic_velocity", mode="stochastic")
scv.tl.velocity_graph(adata, vkey="stochastic_velocity")
scv.tl.velocity_embedding(adata, basis="pca", vkey="stochastic_velocity")

Fit FlowMap Spline

FlowMap can work with existing embeddings. Passing X_emb keeps the given UMAP coordinates and fits the spline maps directly on that embedding.

from flowmap import VectorFieldEmbedder

X_pca = np.asarray(adata.obsm["X_pca"])
V_pca = np.asarray(adata.obsm["stochastic_velocity_pca"])
X_emb = np.asarray(adata.obsm["X_umap"])

emb = VectorFieldEmbedder(
    X_pca,
    V_pca,
    X_emb=X_emb,
    use_PCA=False,
    method="umap",
    dist_method="euclidean",
)

emb.fit_embedding(seed=0)

Plot Embedded Velocity

from flowmap.plot import plot_velocity_grid

cell_type = adata.obs["clusters"].astype(str).to_numpy()

plot_velocity_grid(
    emb.X_emb,
    spline_vf=emb.spline_vf,
    scatter_color=cell_type,
    grid_size=25,
    scatter_size=8,
    scatter_alpha=0.45,
    arrow_scale=0.1,
    arrow_width=0.003,
    cmap="tab20",
)
FlowMap velocity grid for pancreas UMAP

The fitted embedded velocity field on the existing UMAP coordinates.

Evaluate PCs

from flowmap.evaluation import SplineFitEvaluator

def evaluate_pcs_by_cell_type(evaluator, labels, n_pcs=10):
    pc_labels = [f"PC{i + 1}" for i in range(n_pcs)]
    expr_rows, vel_rows = [], []

    for label in pd.unique(labels):
        idx = np.where(labels == label)[0]
        result = evaluator.evaluate(cell_idx=idx)
        expr_rows.append(pd.Series(result["expr_corr_gene"][:n_pcs],
                                   index=pc_labels, name=label))
        vel_rows.append(pd.Series(result["vel_corr_gene"][:n_pcs],
                                  index=pc_labels, name=label))

    return pd.DataFrame(expr_rows), pd.DataFrame(vel_rows)

pc_expr_scores, pc_vel_scores = evaluate_pcs_by_cell_type(
    SplineFitEvaluator(emb, mode="default"),
    cell_type,
)
Per-cell-type PC reconstruction heatmaps

PC-level expression and velocity consistency by cell type.

Raw PC Velocity

The PC-level evaluation suggests that PC3 and PC8 have higher consistency with the embedding. We can look at the raw velocity values directly in that PC space.

from flowmap.plot import plot_2d_quiver

pc_x, pc_y = 2, 7
X_pc = adata.obsm["X_pca"][:, [pc_x, pc_y]]
V_pc = adata.obsm["stochastic_velocity_pca"][:, [pc_x, pc_y]]

mask = np.isin(cell_type, ["Ductal", "Ngn3 low EP"])

plot_2d_quiver(
    X_pc[mask],
    V_pc[mask],
    color=(cell_type[mask] == "Ngn3 low EP").astype(float),
    scale=0.4,
    size=40,
    alpha=0.5,
    cmap="tab10",
)
Raw PC velocity quiver for PC3 and PC8

Raw RNA velocity projected onto PC3 and PC8.

Fit Gene Splines

By default, this tutorial fits FlowMap in PC space. For gene-level evaluation, we can either use PC-level reconstruction as an approximation or refit splines directly from the embedding to genes. Here we refit the gene-level splines.

gene_idx = np.flatnonzero(hvg_mask)
X_gene = to_dense(adata.layers["spliced"][:, gene_idx]).astype(np.float32)
V_gene = to_dense(adata.layers["stochastic_velocity"][:, gene_idx]).astype(np.float32)
gene_names = adata.var_names[gene_idx].to_numpy()

emb.fit_gene_level_splines(
    X=X_gene,
    V=V_gene,
    dof_gene=50,
    dof_vf_gene=50,
)

Gene-Level Evaluation

evaluator = SplineFitEvaluator(emb, mode="gene")
metrics = evaluator.evaluate()
Gene-level reconstruction scatterplots by cell type

Per-gene expression and velocity reconstruction within each cell type.

The runnable notebook version is available at tutorial/pancreas_tutorial.ipynb.