Modules

class scdemon.modules(adata, U=None, s=None, V=None, suffix='', seed=1, k=100, filter_expr=0.05, keep_first_PC=False, process_covariates=False, covariate_cutoff=0.4)

Calculate gene modules from a single-cell anndata object

__init__(adata, U=None, s=None, V=None, suffix='', seed=1, k=100, filter_expr=0.05, keep_first_PC=False, process_covariates=False, covariate_cutoff=0.4)
Parameters:
  • adata (anndata.AnnData) – Single-cell dataset from scanpy

  • U (np.array) – SVD left singular vectors of size (n_obs, k). If all of U, s, V are not provided, will re-calculate SVD on given data.

  • s (np.array) – SVD singular values of size (k).

  • V (np.array) – SVD right singular vectors of size (k, n_var).

  • suffix (str) – Unique suffix for saving image and table filenames

  • seed (int) – Seed, for np.random.seed

  • k (int) – Truncate SVD to k components when estimating correlation

  • filter_expr (float) – Remove all genes whose fraction of non-zero values is below the given cutoff (default is 0.05).

  • keep_first_PC (bool) – Keep the first PC when estimating the correlation.

  • process_covariates (bool) – Compare the metadata covariates to the PCs.

  • covariate_cutoff (float) – covariate_cutoff

setup()

Set up the dataset. Filter genes, calculate PCA, and process covariates. PCA defaults to adata.obsm['X_pca'] if available. Otherwise uses sc.tl.pca from scanpy

make_graph(graph_id, multigraph=False, power=0, **kwargs)

Creates a graph under .graphs[graph_id]

Parameters:
  • graph_id (str) – Unique name for storing/accessing the graph

  • multigraph (bool) – Create from one graph or multiple graphs

  • power (float | list) – Power parameter, either single or multiple or list of powers

  • method (str) –

    Thresholding method for graph:

    'bivariate'

    default, threshold based on bivariate spline fit to gene-gene sparsity

    'cutoff'

    single cutoff across full matrix

    'sd'

    based on estimated sd, expensive

  • filter_covariate (str) – Filter SVD components correlated with a specific covariate

  • raw (bool) – Use raw correlation

  • resolution (float) – Resolution for clustering

  • adjacency_only (bool) – Only run adjacency (default False)

  • full_graph_only (bool) – Compute the full, unaltered, graph for multiplexing, but do not cluster or create modules

  • keep_all_z (bool) – When computing full graph, don’t threshold, keep dense matrix

  • layout (bool) – Lay out graph (default True)

  • **kwargs – Any args for adjacency_matrix or gene_graph

get_k_stats(k_list, power=0, resolution=None, raw=False, method='bivariate', filter_covariate=None, **kwargs)

Get statistics on # genes and # modules for each number of SVD components (k)

Parameters:
  • k_list (list) – List of SVD component cutoffs

  • power (float) – Power parameter, here only single value

  • raw (bool) – Use raw correlation

  • resolution (float) – Resolution for clustering

  • method (str) –

    Thresholding method for graph:

    'bivariate'

    default, threshold based on bivariate spline fit to gene-gene sparsity

    'cutoff'

    single cutoff across full matrix

    'sd'

    based on estimated sd, expensive

  • filter_covariate (str) – Filter SVD components correlated with a specific covariate

  • **kwargs – Extended arguments for adjacency_matrix or gene_graph

Return type:

ngenes, nmodules - Lists of number of genes and modules identified at each setting of k

recluster_graph(graph_id, resolution=None)

Re-cluster a graph with a different resolution.

Parameters:
  • graph_id (str) – Unique name for storing/accessing the graph

  • resolution (float) – Resolution for clustering

get_modules(graph_id, attr='leiden', print_modules=False)

Get list of modules from graph and clustering.

Parameters:
  • graph_id (str) – Unique name for storing/accessing the graph

  • attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

  • print_modules (bool) – Whether to print modules at the same time

Return type:

Dictionary of genes in each module in the graph

get_module_assignment(graph_id, attr='leiden')

Get module assignment for each gene as a pandas DataFrame.

Parameters:
  • graph_id (str) – Unique name for storing/accessing the graph

  • attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

Return type:

pandas.DataFrame with gene to module assignments

find_gene(graph_id, gene, return_genes=True, print_genes=True)

Find the module containing a specific gene.

Parameters:
  • graph_id (str) – Unique name for storing/accessing the graph

  • gene (str) – Gene to look up in the modules

  • return_genes (bool) – Whether to return genes in the module

  • print_genes (bool) – Whether to print genes at the same time

Return type:

If return_genes=True returns the list of genes in the module that contains the gene in question

save_modules(graph_id, attr='leiden', as_df=True, filedir='./', filename=None)

Save module list for a specific graph as txt or tsv.

Parameters:
  • graph_id (str) – Unique name for storing/accessing the graph

  • attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

  • as_df (bool) – Write out dataframe instead of a raw list of genes per module

  • filedir (str) – Directory for file, defaults to ./

  • filename (str) – Name for file overriding default naming scheme.