Modules

class scdemon.modules(adata, U=None, s=None, V=None, suffix='', seed=1, k=100, filter_expr=0.05, keep_first_PC=False, process_covariates=False, covariate_cutoff=0.4)

Calculate gene modules from a single-cell anndata object

__init__(adata, U=None, s=None, V=None, suffix='', seed=1, k=100, filter_expr=0.05, keep_first_PC=False, process_covariates=False, covariate_cutoff=0.4)

Parameters:

adata (anndata.AnnData) – Single-cell dataset from scanpy
U (np.array) – SVD left singular vectors of size (n_obs, k). If all of U, s, V are not provided, will re-calculate SVD on given data.
s (np.array) – SVD singular values of size (k).
V (np.array) – SVD right singular vectors of size (k, n_var).
suffix (str) – Unique suffix for saving image and table filenames
seed (int) – Seed, for np.random.seed
k (int) – Truncate SVD to k components when estimating correlation
filter_expr (float) – Remove all genes whose fraction of non-zero values is below the given cutoff (default is 0.05).
keep_first_PC (bool) – Keep the first PC when estimating the correlation.
process_covariates (bool) – Compare the metadata covariates to the PCs.
covariate_cutoff (float) – covariate_cutoff

setup(): Set up the dataset. Filter genes, calculate PCA, and process covariates. PCA defaults to adata.obsm['X_pca'] if available. Otherwise uses sc.tl.pca from scanpy

make_graph(graph_id, multigraph=False, power=0, **kwargs)

Creates a graph under .graphs[graph_id]

Parameters:

graph_id (str) – Unique name for storing/accessing the graph
multigraph (bool) – Create from one graph or multiple graphs
power (float | list) – Power parameter, either single or multiple or list of powers
method (str) –
Thresholding method for graph:

'bivariate'
default, threshold based on bivariate spline fit to gene-gene sparsity

'cutoff'
single cutoff across full matrix

'sd'
based on estimated sd, expensive
filter_covariate (str) – Filter SVD components correlated with a specific covariate
raw (bool) – Use raw correlation
resolution (float) – Resolution for clustering
adjacency_only (bool) – Only run adjacency (default False)
full_graph_only (bool) – Compute the full, unaltered, graph for multiplexing, but do not cluster or create modules
keep_all_z (bool) – When computing full graph, don’t threshold, keep dense matrix
layout (bool) – Lay out graph (default True)
**kwargs – Any args for adjacency_matrix or gene_graph

get_k_stats(k_list, power=0, resolution=None, raw=False, method='bivariate', filter_covariate=None, **kwargs)

Get statistics on # genes and # modules for each number of SVD components (k)

Parameters:

k_list (list) – List of SVD component cutoffs
power (float) – Power parameter, here only single value
raw (bool) – Use raw correlation
resolution (float) – Resolution for clustering
method (str) –
Thresholding method for graph:

'bivariate'
default, threshold based on bivariate spline fit to gene-gene sparsity

'cutoff'
single cutoff across full matrix

'sd'
based on estimated sd, expensive
filter_covariate (str) – Filter SVD components correlated with a specific covariate
**kwargs – Extended arguments for adjacency_matrix or gene_graph

Return type:

ngenes, nmodules - Lists of number of genes and modules identified at each setting of k

recluster_graph(graph_id, resolution=None)

Re-cluster a graph with a different resolution.

Parameters:

graph_id (str) – Unique name for storing/accessing the graph
resolution (float) – Resolution for clustering

get_modules(graph_id, attr='leiden', print_modules=False)

Get list of modules from graph and clustering.

Parameters:

graph_id (str) – Unique name for storing/accessing the graph
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
print_modules (bool) – Whether to print modules at the same time

Return type:

Dictionary of genes in each module in the graph

get_module_assignment(graph_id, attr='leiden')

Get module assignment for each gene as a pandas DataFrame.

Parameters:

graph_id (str) – Unique name for storing/accessing the graph
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

Return type:

pandas.DataFrame with gene to module assignments

find_gene(graph_id, gene, return_genes=True, print_genes=True)

Find the module containing a specific gene.

Parameters:

graph_id (str) – Unique name for storing/accessing the graph
gene (str) – Gene to look up in the modules
return_genes (bool) – Whether to return genes in the module
print_genes (bool) – Whether to print genes at the same time

Return type:

If return_genes=True returns the list of genes in the module that contains the gene in question

save_modules(graph_id, attr='leiden', as_df=True, filedir='./', filename=None)

Save module list for a specific graph as txt or tsv.

Parameters:

graph_id (str) – Unique name for storing/accessing the graph
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
as_df (bool) – Write out dataframe instead of a raw list of genes per module
filedir (str) – Directory for file, defaults to ./
filename (str) – Name for file overriding default naming scheme.