Example notebook#

%load_ext autoreload
%autoreload 2

import os

while not os.path.exists("pyproject.toml"):
    os.chdir("..")
import scanpy as sc
import nichepca as npc

Load data#

Your AnnData object is expected to contain raw counts in adata.X.

adata = sc.read_h5ad("path/to/your/data.h5ad")

Standard pipeline#

We found that higher number of neighbors e.g., knn=25 lead to better results in brain tissue, while knn=10 works well for kidney data. We recommend to qualitatively optimize these parameters on a small subset of your data. The number of PCs (n_comps=30 by default) seems to have negligible effect on the results.

npc.wf.nichepca(adata, knn=25)
sc.pp.neighbors(adata, use_rep="X_npca")
sc.tl.leiden(adata, resolution=0.5, flavor="igraph", n_iterations=2)

Multi-sample domain identification#

If you have multiple samples in adata.obs["sample"], you can provide the key sample to npc.wf.nichepca this uses harmony by default:

npc.wf.nichepca(adata, knn=25, sample_key="sample")

If you have cell type labels in adata.obs["cell_type"], you can directly provide them to nichepca as follows (we found this sometimes works better for multi-sample domain identification). However, in this case we need to run npc.cl.leiden_unique to handle potential duplicate embeddings:

npc.wf.nichepca(adata, knn=25, obs_key="cell_type", sample_key="sample")
npc.cl.leiden_unique(adata, use_rep="X_npca", resolution=0.5, n_neighbors=15)

Run custom pipelines#

The nichepca function also allows to customize the original ("norm", "log1p", "agg", "pca") pipeline, e.g., without median normalization:

npc.wf.nichepca(adata, knn=25, pipeline=["log1p", "agg", "pca"])

or with "pca" before "agg":

npc.wf.nichepca(adata, knn=25, pipeline=["norm", "log1p", "pca", "agg"])