Modules
- class scdemon.modules(adata, U=None, s=None, V=None, suffix='', seed=1, k=100, filter_expr=0.05, keep_first_PC=False, process_covariates=False, covariate_cutoff=0.4)
Calculate gene modules from a single-cell anndata object
- __init__(adata, U=None, s=None, V=None, suffix='', seed=1, k=100, filter_expr=0.05, keep_first_PC=False, process_covariates=False, covariate_cutoff=0.4)
- Parameters:
adata (anndata.AnnData) – Single-cell dataset from scanpy
U (np.array) – SVD left singular vectors of size
(n_obs, k). If all ofU,s,Vare not provided, will re-calculate SVD on given data.s (np.array) – SVD singular values of size
(k).V (np.array) – SVD right singular vectors of size
(k, n_var).suffix (str) – Unique suffix for saving image and table filenames
seed (int) – Seed, for np.random.seed
k (int) – Truncate SVD to k components when estimating correlation
filter_expr (float) – Remove all genes whose fraction of non-zero values is below the given cutoff (default is
0.05).keep_first_PC (bool) – Keep the first PC when estimating the correlation.
process_covariates (bool) – Compare the metadata covariates to the PCs.
covariate_cutoff (float) – covariate_cutoff
- setup()
Set up the dataset. Filter genes, calculate PCA, and process covariates. PCA defaults to
adata.obsm['X_pca']if available. Otherwise usessc.tl.pcafromscanpy
- make_graph(graph_id, multigraph=False, power=0, **kwargs)
Creates a graph under
.graphs[graph_id]- Parameters:
graph_id (str) – Unique name for storing/accessing the graph
multigraph (bool) – Create from one graph or multiple graphs
power (float | list) – Power parameter, either single or multiple or list of powers
method (str) –
Thresholding method for graph:
'bivariate'default, threshold based on bivariate spline fit to gene-gene sparsity
'cutoff'single cutoff across full matrix
'sd'based on estimated sd, expensive
filter_covariate (str) – Filter SVD components correlated with a specific covariate
raw (bool) – Use raw correlation
resolution (float) – Resolution for clustering
adjacency_only (bool) – Only run adjacency (default False)
full_graph_only (bool) – Compute the full, unaltered, graph for multiplexing, but do not cluster or create modules
keep_all_z (bool) – When computing full graph, don’t threshold, keep dense matrix
layout (bool) – Lay out graph (default True)
**kwargs – Any args for adjacency_matrix or gene_graph
- get_k_stats(k_list, power=0, resolution=None, raw=False, method='bivariate', filter_covariate=None, **kwargs)
Get statistics on # genes and # modules for each number of SVD components (k)
- Parameters:
k_list (list) – List of SVD component cutoffs
power (float) – Power parameter, here only single value
raw (bool) – Use raw correlation
resolution (float) – Resolution for clustering
method (str) –
Thresholding method for graph:
'bivariate'default, threshold based on bivariate spline fit to gene-gene sparsity
'cutoff'single cutoff across full matrix
'sd'based on estimated sd, expensive
filter_covariate (str) – Filter SVD components correlated with a specific covariate
**kwargs – Extended arguments for adjacency_matrix or gene_graph
- Return type:
ngenes, nmodules- Lists of number of genes and modules identified at each setting of k
- recluster_graph(graph_id, resolution=None)
Re-cluster a graph with a different resolution.
- Parameters:
graph_id (str) – Unique name for storing/accessing the graph
resolution (float) – Resolution for clustering
- get_modules(graph_id, attr='leiden', print_modules=False)
Get list of modules from graph and clustering.
- Parameters:
graph_id (str) – Unique name for storing/accessing the graph
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
print_modules (bool) – Whether to print modules at the same time
- Return type:
Dictionary of genes in each module in the graph
- get_module_assignment(graph_id, attr='leiden')
Get module assignment for each gene as a pandas DataFrame.
- Parameters:
graph_id (str) – Unique name for storing/accessing the graph
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
- Return type:
pandas.DataFramewith gene to module assignments
- find_gene(graph_id, gene, return_genes=True, print_genes=True)
Find the module containing a specific gene.
- Parameters:
graph_id (str) – Unique name for storing/accessing the graph
gene (str) – Gene to look up in the modules
return_genes (bool) – Whether to return genes in the module
print_genes (bool) – Whether to print genes at the same time
- Return type:
If
return_genes=Truereturns the list of genes in the module that contains the gene in question
- save_modules(graph_id, attr='leiden', as_df=True, filedir='./', filename=None)
Save module list for a specific graph as txt or tsv.
- Parameters:
graph_id (str) – Unique name for storing/accessing the graph
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
as_df (bool) – Write out dataframe instead of a raw list of genes per module
filedir (str) – Directory for file, defaults to
./filename (str) – Name for file overriding default naming scheme.