Graph
- class scdemon.graph.gene_graph(corr, genes, adj=None, graph=None, edge_weight=None, layout_method='fr', min_size=4)
Gene-gene graph
- __init__(corr, genes, adj=None, graph=None, edge_weight=None, layout_method='fr', min_size=4)
Initialize gene-gene graph given adjacency matrix object or pre-computed graph
- Parameters:
corr (np.array | sparse.csr_matrix) – Gene-gene correlation matrix
genes (np.array) – List of genes
adj (adjacency_matrix) – Adjacency matrix object
graph (igraph.Graph) – Pre-computed graph (used for multigraph clustering)
edge_weight (float) – Edge weight for graphs, if want a fixed weight
layout_method (str) – Layout method for
.layout()onigraphobjectmin_size (int) – Minimum size of a graph component, for pruning
- construct_graph(resolution=2, method='leiden', full_graph=False, modules=True, layout=True)
Construct the graph object and find gene modules.
- Parameters:
resolution (float) – Resolution for clustering graph into modules
method (str) – Method for clustering modules. Only
'leiden'currently implementedfull_graph (bool) – Create full graph or threshold low values
modules (bool) – Whether to calculate gene modules
layout (bool) – Whether to lay out the graph
- layout_graph(layout_method='fr')
Compute graph layout.
- Parameters:
layout_method – Layout method for
.layout()onigraphobject
- calculate_gene_modules(method='leiden', **kwargs)
Calculate modules from gene-gene graph using graph clustering.
- Parameters:
method (str) – Method for clustering modules. Only
'leiden'currently implemented**kwargs – Arguments for calculating modules using given method
- compute_umap()
Calculate UMAP for correlation estimate underlying graph.
- get_modules(attr='leiden', adata=None, print_modules=False)
Get list of modules from graph and clustering.
- Parameters:
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
adata (AnnData) – AnnData object (needed if modules haven’t been populated)
print_modules (bool) – Whether to print modules as well
- Return type:
Dictionary of modules with lists of assigned genes
- get_module_assignment(attr='leiden', adata=None)
Get module assignment for each gene as a pandas DataFrame.
- find_gene(gene, attr='leiden', print_genes=False)
Find the module containing a specific gene.
- Parameters:
gene (str) – Gene to look up in the modules
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
print_genes (bool) – Whether to print the list of genes in the module
- Return type:
List of genes in the module that contains the query gene
- populate_modules(adata, attr='leiden')
Populate modules data, including average expression across cells and the top genes.
- Parameters:
adata (AnnData) – Single-cell dataset, if need to populate modules
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
- match_genes_to_modules(attr='leiden')
Match genes to the closest to module by its correlation.
- save_modules(filename, attr='leiden', as_df=True, adata=None)
Save module list for this specific graph as txt or tsv.
- Parameters:
filename (str) – Name for output file, required
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
as_df (bool) – Write out dataframe instead of a raw list of genes per module
adata (AnnData) – Single-cell dataset, if need to populate modules
- class scdemon.graph.adjacency_matrix(corr, adjacency=None, method='bivariate', corr_sd=None, labels=None, margin=None, cutoff=0.4, z=4.5, zero_outliers=True, keep_all_z=False, knn_k=None, scale=None, degree_cutoff=0)
Process given correlation matrix into the adjacency matrix.
- __init__(corr, adjacency=None, method='bivariate', corr_sd=None, labels=None, margin=None, cutoff=0.4, z=4.5, zero_outliers=True, keep_all_z=False, knn_k=None, scale=None, degree_cutoff=0)
Initialize adjacency matrix class.
- Parameters:
corr (np.array | sparse.csr_matrix) – Gene-gene correlation matrix
adjacency (np.array) – Adjacency matrix
method (str) –
Thresholding method for graph:
'bivariate'default, threshold based on bivariate spline fit to gene-gene sparsity
'cutoff'single cutoff across full matrix
'sd'based on estimated sd, expensive
corr_sd (np.array) – Estimated standard deviation of correlations. Only used for
sdmethodlabels (np.array) – Labels for nodes (genes)
margin (np.array) – Fraction non-zero values for each of the variables in the original dataset (gene sparsity)
cutoff (float) – Raw correlation threshold
z (float) – Z-score threshold
zero_outliers (bool) – Whether to set outliers to 0 when calculating splines
keep_all_z (bool) – Whether to keep z-score matrix dense by removing all values below threshold
knn_k (int) – Pruning: keep only the top k edges per node
scale (float) – Pruning: remove edges below a certain fraction (in (0.0, 1.0]) of the top edge for the node
degree_cutoff (int) – Pruning: remove graph nodes with degree below this cutoff
- get_adjacency()
Get the adjacency matrix and the final kept labels.