Graph

class scdemon.graph.gene_graph(corr, genes, adj=None, graph=None, edge_weight=None, layout_method='fr', min_size=4)

Gene-gene graph

__init__(corr, genes, adj=None, graph=None, edge_weight=None, layout_method='fr', min_size=4)

Initialize gene-gene graph given adjacency matrix object or pre-computed graph

Parameters:
  • corr (np.array | sparse.csr_matrix) – Gene-gene correlation matrix

  • genes (np.array) – List of genes

  • adj (adjacency_matrix) – Adjacency matrix object

  • graph (igraph.Graph) – Pre-computed graph (used for multigraph clustering)

  • edge_weight (float) – Edge weight for graphs, if want a fixed weight

  • layout_method (str) – Layout method for .layout() on igraph object

  • min_size (int) – Minimum size of a graph component, for pruning

construct_graph(resolution=2, method='leiden', full_graph=False, modules=True, layout=True)

Construct the graph object and find gene modules.

Parameters:
  • resolution (float) – Resolution for clustering graph into modules

  • method (str) – Method for clustering modules. Only 'leiden' currently implemented

  • full_graph (bool) – Create full graph or threshold low values

  • modules (bool) – Whether to calculate gene modules

  • layout (bool) – Whether to lay out the graph

layout_graph(layout_method='fr')

Compute graph layout.

Parameters:

layout_method – Layout method for .layout() on igraph object

calculate_gene_modules(method='leiden', **kwargs)

Calculate modules from gene-gene graph using graph clustering.

Parameters:
  • method (str) – Method for clustering modules. Only 'leiden' currently implemented

  • **kwargs – Arguments for calculating modules using given method

compute_umap()

Calculate UMAP for correlation estimate underlying graph.

get_modules(attr='leiden', adata=None, print_modules=False)

Get list of modules from graph and clustering.

Parameters:
  • attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

  • adata (AnnData) – AnnData object (needed if modules haven’t been populated)

  • print_modules (bool) – Whether to print modules as well

Return type:

Dictionary of modules with lists of assigned genes

get_module_assignment(attr='leiden', adata=None)

Get module assignment for each gene as a pandas DataFrame.

find_gene(gene, attr='leiden', print_genes=False)

Find the module containing a specific gene.

Parameters:
  • gene (str) – Gene to look up in the modules

  • attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

  • print_genes (bool) – Whether to print the list of genes in the module

Return type:

List of genes in the module that contains the query gene

populate_modules(adata, attr='leiden')

Populate modules data, including average expression across cells and the top genes.

Parameters:
  • adata (AnnData) – Single-cell dataset, if need to populate modules

  • attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

match_genes_to_modules(attr='leiden')

Match genes to the closest to module by its correlation.

save_modules(filename, attr='leiden', as_df=True, adata=None)

Save module list for this specific graph as txt or tsv.

Parameters:
  • filename (str) – Name for output file, required

  • attr (str) – Modules name within the graph (‘leiden’ is only current supported method)

  • as_df (bool) – Write out dataframe instead of a raw list of genes per module

  • adata (AnnData) – Single-cell dataset, if need to populate modules

class scdemon.graph.adjacency_matrix(corr, adjacency=None, method='bivariate', corr_sd=None, labels=None, margin=None, cutoff=0.4, z=4.5, zero_outliers=True, keep_all_z=False, knn_k=None, scale=None, degree_cutoff=0)

Process given correlation matrix into the adjacency matrix.

__init__(corr, adjacency=None, method='bivariate', corr_sd=None, labels=None, margin=None, cutoff=0.4, z=4.5, zero_outliers=True, keep_all_z=False, knn_k=None, scale=None, degree_cutoff=0)

Initialize adjacency matrix class.

Parameters:
  • corr (np.array | sparse.csr_matrix) – Gene-gene correlation matrix

  • adjacency (np.array) – Adjacency matrix

  • method (str) –

    Thresholding method for graph:

    'bivariate'

    default, threshold based on bivariate spline fit to gene-gene sparsity

    'cutoff'

    single cutoff across full matrix

    'sd'

    based on estimated sd, expensive

  • corr_sd (np.array) – Estimated standard deviation of correlations. Only used for sd method

  • labels (np.array) – Labels for nodes (genes)

  • margin (np.array) – Fraction non-zero values for each of the variables in the original dataset (gene sparsity)

  • cutoff (float) – Raw correlation threshold

  • z (float) – Z-score threshold

  • zero_outliers (bool) – Whether to set outliers to 0 when calculating splines

  • keep_all_z (bool) – Whether to keep z-score matrix dense by removing all values below threshold

  • knn_k (int) – Pruning: keep only the top k edges per node

  • scale (float) – Pruning: remove edges below a certain fraction (in (0.0, 1.0]) of the top edge for the node

  • degree_cutoff (int) – Pruning: remove graph nodes with degree below this cutoff

get_adjacency()

Get the adjacency matrix and the final kept labels.