NichePCA: PCA-based spatial domain identification with state-of-the-art performance

NichePCA: PCA-based spatial domain identification with state-of-the-art performance#

Version License Python Version Required Ruff pre-commit PyPI downloads Tests Documentation

NichePCA is a package for PCA-based spatial domain identification in single-cell spatial transcriptomics data. The corresponding manuscript was published in Bioinformatics.

Installation#

You need to have Python 3.11 or newer installed on your system. If you don’t have Python installed, we recommend installing [uv][].

There are several alternative options to install nichepca:

  1. Install the latest release of nichepca from PyPI:

pip install nichepca
  1. Install the latest development version:

pip install git+https://github.com/imsb-uke/nichepca.git@main

Getting started#

Please refer to the documentation. In particular, the API documentation.

Given an AnnData object adata, you can run nichepca starting from raw counts as follows:

import scanpy as sc
import nichepca as npc

npc.wf.nichepca(adata, knn=25)
sc.pp.neighbors(adata, use_rep="X_npca")
sc.tl.leiden(adata, resolution=0.5)

Multi-sample support#

If you have multiple samples in adata.obs["sample"], you can provide the key sample to npc.wf.nichepca this uses harmony by default:

npc.wf.nichepca(adata, knn=25, sample_key="sample")

If you have cell type labels in adata.obs["cell_type"], you can directly provide them to nichepca as follows (we found this sometimes works better for multi-sample domain identification). However, in this case we need to run npc.cl.leiden_unique to handle potential duplicate embeddings:

npc.wf.nichepca(adata, knn=25, obs_key="cell_type", sample_key="sample")
npc.cl.leiden_unique(adata, use_rep="X_npca", resolution=0.5, n_neighbors=15)

Customization#

The nichepca function also allows to customize the original ("norm", "log1p", "agg", "pca") pipeline, e.g., without median normalization:

npc.wf.nichepca(adata, knn=25, pipeline=["log1p", "agg", "pca"])

or with "pca" before "agg":

npc.wf.nichepca(adata, knn=25, pipeline=["norm", "log1p", "pca", "agg"])

or without "pca" at all:

npc.wf.nichepca(adata, knn=25, pipeline=["norm", "log1p", "agg"])

Hyperparameter choice#

We found that higher number of neighbors e.g., knn=25 lead to better results in brain tissue, while knn=10 works well for kidney data. We recommend to qualitatively optimize these parameters on a small subset of your data. The number of PCs (n_comps=30 by default) seems to have negligible effect on the results.

Contributing#

If you want to contribute you can follow this guide. In short fork the repository, setup a dev environment using this command:

git clone https://github.com/{your-username}/nichepca.git
cd nichepca
uv sync --all-extras

And then make your changes, run the tests and submit a pull request.

Release notes#

See the changelog.

Contact#

For questions and help requests, you can reach out in the scverse discourse. If you found a bug, please use the issue tracker.

Citation#

If you use NichePCA in your research, please cite:

@article{schaub2025pca,
  title={PCA-based spatial domain identification with state-of-the-art performance},
  author={Schaub, Darius P and Yousefi, Behnam and Kaiser, Nico and Khatri, Robin and Puelles, Victor G and Krebs, Christian F and Panzer, Ulf and Bonn, Stefan},
  journal={Bioinformatics},
  volume={41},
  number={1},
  pages={btaf005},
  year={2025},
  publisher={Oxford University Press}
}