Graph

class scdemon.graph.gene_graph(corr, genes, adj=None, graph=None, edge_weight=None, layout_method='fr', min_size=4)

Gene-gene graph

__init__(corr, genes, adj=None, graph=None, edge_weight=None, layout_method='fr', min_size=4)

Initialize gene-gene graph given adjacency matrix object or pre-computed graph

Parameters:

corr (np.array | sparse.csr_matrix) – Gene-gene correlation matrix
genes (np.array) – List of genes
adj (adjacency_matrix) – Adjacency matrix object
graph (igraph.Graph) – Pre-computed graph (used for multigraph clustering)
edge_weight (float) – Edge weight for graphs, if want a fixed weight
layout_method (str) – Layout method for .layout() on igraph object
min_size (int) – Minimum size of a graph component, for pruning

construct_graph(resolution=2, method='leiden', full_graph=False, modules=True, layout=True)

Construct the graph object and find gene modules.

Parameters:

resolution (float) – Resolution for clustering graph into modules
method (str) – Method for clustering modules. Only 'leiden' currently implemented
full_graph (bool) – Create full graph or threshold low values
modules (bool) – Whether to calculate gene modules
layout (bool) – Whether to lay out the graph

layout_graph(layout_method='fr')

Compute graph layout.

Parameters:: layout_method – Layout method for .layout() on igraph object

calculate_gene_modules(method='leiden', **kwargs)

Calculate modules from gene-gene graph using graph clustering.

Parameters:

method (str) – Method for clustering modules. Only 'leiden' currently implemented
**kwargs – Arguments for calculating modules using given method

compute_umap(): Calculate UMAP for correlation estimate underlying graph.

get_modules(attr='leiden', adata=None, print_modules=False)

Get list of modules from graph and clustering.

Parameters:

attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
adata (AnnData) – AnnData object (needed if modules haven’t been populated)
print_modules (bool) – Whether to print modules as well

Return type:

Dictionary of modules with lists of assigned genes

get_module_assignment(attr='leiden', adata=None): Get module assignment for each gene as a pandas DataFrame.

find_gene(gene, attr='leiden', print_genes=False)

Find the module containing a specific gene.

Parameters:

gene (str) – Gene to look up in the modules
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
print_genes (bool) – Whether to print the list of genes in the module

Return type:

List of genes in the module that contains the query gene

populate_modules(adata, attr='leiden')

Populate modules data, including average expression across cells and the top genes.

Parameters:

adata (AnnData) – Single-cell dataset, if need to populate modules
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

match_genes_to_modules(attr='leiden'): Match genes to the closest to module by its correlation.

save_modules(filename, attr='leiden', as_df=True, adata=None)

Save module list for this specific graph as txt or tsv.

Parameters:

filename (str) – Name for output file, required
attr (str) – Modules name within the graph (‘leiden’ is only current supported method)
as_df (bool) – Write out dataframe instead of a raw list of genes per module
adata (AnnData) – Single-cell dataset, if need to populate modules

class scdemon.graph.adjacency_matrix(corr, adjacency=None, method='bivariate', corr_sd=None, labels=None, margin=None, cutoff=0.4, z=4.5, zero_outliers=True, keep_all_z=False, knn_k=None, scale=None, degree_cutoff=0)

Process given correlation matrix into the adjacency matrix.

__init__(corr, adjacency=None, method='bivariate', corr_sd=None, labels=None, margin=None, cutoff=0.4, z=4.5, zero_outliers=True, keep_all_z=False, knn_k=None, scale=None, degree_cutoff=0)

Initialize adjacency matrix class.

Parameters:

corr (np.array | sparse.csr_matrix) – Gene-gene correlation matrix
adjacency (np.array) – Adjacency matrix
method (str) –
Thresholding method for graph:

'bivariate'
default, threshold based on bivariate spline fit to gene-gene sparsity

'cutoff'
single cutoff across full matrix

'sd'
based on estimated sd, expensive
corr_sd (np.array) – Estimated standard deviation of correlations. Only used for sd method
labels (np.array) – Labels for nodes (genes)
margin (np.array) – Fraction non-zero values for each of the variables in the original dataset (gene sparsity)
cutoff (float) – Raw correlation threshold
z (float) – Z-score threshold
zero_outliers (bool) – Whether to set outliers to 0 when calculating splines
keep_all_z (bool) – Whether to keep z-score matrix dense by removing all values below threshold
knn_k (int) – Pruning: keep only the top k edges per node
scale (float) – Pruning: remove edges below a certain fraction (in (0.0, 1.0]) of the top edge for the node
degree_cutoff (int) – Pruning: remove graph nodes with degree below this cutoff

get_adjacency(): Get the adjacency matrix and the final kept labels.