Network Enrichment#

The network module handles all quantitative computations including estimation of reaction activity change, multi-omics reaction associations and optimal subgraph identification via local search.

Functions#

Model Handling#

The following functions are the main computations for estimating reaction changes and multi-omics associations

pymantra.network.compute_reaction_estimates(graph: DiGraph, metabolite_data: DataFrame, sample_groups: Series, covariates: DataFrame | None = None, random_effects: str | List[str] | None = None, lmm_args: dict | None = None, residual_summary: str = 'expl_var', return_all: bool = False, control_group: any | None = None, **kwargs)[source]#

Generate reaction estimates

Compute the linear-model estimates for a given graph and metabolomics data

Parameters:
  • graph (nx.DiGraph) – Metabolite-reaction graph. Metabolites need to be denoted as ‘metabolite’ (via node attribute ‘node_type’) and reactions as ‘reaction’.

  • metabolite_data (pd.DataFrame) – Metabolite data with samples in rows and metabolites in columns. Metabolite names need to match the metabolite node names in graph and indices need to match the indices of sample_groups.

  • sample_groups (pd.Series) – Array indicating sample groups

  • covariates (pd.DataFrame, optional) – Confounder variables to correct for. Correction is done using a Linear Mixed Model. All variables (i.e. columns) not specified as random effect variables in random_effects are assumed to be fixed effects variables. The correction currently only supports simple fixed and random effects inclusion. For more complex setups including factor interaction, it is recommended to do the correction beforehand and only pass the residuals to this function instead of the original metabolome data frame.

  • random_effects (str | List[str], optional) – Random effects for confounder correction. If covariates is None this has no effect. Else, this specifies which column(s) of covariates to include as random effects, all other columns will be included as fixed effects. If this is None, all columns of covariates are assumed to be fixed effects.

  • lmm_args (dict, optional) – Keyword arguments for MixedLM.from_formula(). Ignored unless covariates and random_effects are both not None.

  • residual_summary (str, "expl_var") – Which method to use as residual summary statistic. Either “expl_var” for explained variance (RSS/TSS) or “norm” for p-norm

  • return_all (bool, False) – Whether to return all variables return by per_sample_ld_estimation() or only return the scaled residuals

  • control_group (any, optional) – Name of the control group

  • kwargs – Keyword arguments. See per_sample_ld_estimation() for details.

Returns:

  • Union[ – pd.DataFrame, Tuple[Dict[str, LinearModel], Dict[str, np.ndarray], pd.DataFrame]

  • ] – If control_groups is False a pd.DataFrame with samples as rows and reactions in columns. Else a 3-tuple as returned by per_sample_ld_estimation() only with scaled_residuals as a pd.DataFrame generated from the initially returned dictionary

Examples

>>> from pymantra.datasets import example_metabolome_enrichment_data
>>> metabolite_data, sample_groups, graph =     ...     example_metabolome_enrichment_data()
>>> compute_reaction_estimates(graph, metabolite_data, sample_groups)
pymantra.network.add_reaction_estimates(graph: DiGraph, sample_groups: Series | None = None, estimate_data: DataFrame | None = None, metabolite_data: DataFrame | None = None, control_group: any | None = None, return_estimates: bool = True, **kwargs)[source]#

Add reaction estimates to a metabolite-reaction graph

Add the linear model estimates to a given metabolite-reaction graph, either using pre-computed estimates or computing estimates via compute_reaction_estimates() and adding them directly.

Parameters:
  • graph (nx.DiGraph) – Metabolite-reaction graph. Metabolites need to be denoted as ‘metabolite’ (via node attribute ‘node_type’) and reactions as ‘reaction’.

  • sample_groups (pd.Series) – Array indicating sample group

  • estimate_data (pd.DataFrame, optional) – Linear-model estimates per reaction and sample as generated by compute_reaction_estimates(). If None, estimates will be computed from metabolite_data, hence must be given.

  • metabolite_data (pd.DataFrame, optional) – Metabolite data with samples in rows and metabolites in columns. If estimate_data is None this becomes a required parameter as it will be used to compute the reaction models. Metabolite names need to match the metabolite node names in graph and indices need to match the indices of sample_groups.

  • control_group (any, optional) – Name of the control group

  • return_estimates (bool, False) – Whether to return the linear model-base estimates computed in compute_reaction_estimates()

  • kwargs – Keyword arguments. See per_sample_ld_estimation() for details

Examples

>>> from pymantra.datasets import example_metabolome_enrichment_data
>>> metabolite_data, sample_groups, graph =     ...     example_metabolome_enrichment_data()
>>> residuals =     ...     compute_reaction_estimates(graph, metabolite_data, sample_groups)
>>> add_reaction_estimates(graph, sample_groups, residuals)
pymantra.network.compute_multiomics_associations(residuals: DataFrame, multi_omics: DataFrame, sample_groups: Series, covariates: DataFrame | None = None, random_effects: str | List[str] | None = None, lmm_args: dict | None = None, **kwargs)[source]#

Compute multi-omics associations with reaction estimates

Compute the associations between multi-omics features and the residuals of reactions as estimated by the linear models.

This is essentially a wrapper for associate_multiomics_ld() for interface consistency.

Parameters:
  • residuals (pd.DataFrame) – Linear model residuals matrix with samples in rows and reactions in columns

  • multi_omics (pd.DataFrame) – Multi-omics measurements with samples in rows and multi-omics features in columns.

  • sample_groups (pd.Series) – Array of sample groups

  • covariates (pd.DataFrame, optional) – Confounder variables to correct for. Correction is done using a Linear Mixed Model. All variables (i.e. columns) not specified as random effect variables in random_effects are assumed to be fixed effects variables. The correction currently only supports simple fixed and random effects inclusion. For more complex setups including factor interaction, it is recommended to do the correction beforehand and only pass the residuals to this function instead of the original metabolome data frame.

  • random_effects (str | List[str], optional) – Random effects for confounder correction. If covariates is None this has no effect. Else, this specifies which column(s) of covariates to include as random effects, all other columns will be included as fixed effects. If this is None, all columns of covariates are assumed to be fixed effects.

  • lmm_args (dict, optional) – Keyword arguments for statsmodels.regression.mixed_linear_model.MixedLM.from_formula. Ignored unless covariates and random_effects are both not None.

  • kwargs – Keyword arguments passed to associate_multiomics_ld()

Returns:

2-tuple where the first elements are correlations per group as a data frame of multi-omics x reaction and the second element are the correlation p-values per group in the same format as the correlations

Return type:

Tuple[Dict[str, pd.DataFrame], Dict[str, pd.DataFrame]]

Examples

>>> from pymantra.datasets import example_multiomics_enrichment_data
>>> from pymantra import (
...     compute_reaction_estimates, compute_multiomics_associations)
>>> metabolite_data, microbiome_data, sample_groups, graph =     ...     example_multiomics_enrichment_data()
>>> residuals =     ...     compute_reaction_estimates(graph, metabolite_data, sample_groups)
>>> compute_multiomics_associations(
...     residuals, microbiome_data, sample_groups)
pymantra.network.add_microbiome_associations(graph: DiGraph, sample_groups: Series, associations: Dict[str, DataFrame] | None = None, residuals: DataFrame | None = None, microbiome_data: DataFrame | None = None, **kwargs)[source]#

Add microbiome-reaction associations to a multi-omics graph

Add the association estimates to a given multi-omics-reaction graph, either using pre-computed estimates or computing estimates via compute_multiomics_associations() and adding them directly.

Parameters:
  • graph (nx.DiGraph) – Metabolite-reaction graph containing additional reaction-organism connections. Usually when calling this function reaction estimates are already added to the graph.

  • sample_groups (pd.Series) – Array indicating sample group

  • associations (Dict[str, pd.DataFrame], optional) – Reaction-microbiome associations per group as generated by compute_multiomics_associations()

  • residuals (pd.DataFrame) – Linear-model estimates per reaction and sample as generated by compute_reaction_estimates(). If associations is None this parameter is required.

  • microbiome_data (pd.DataFrame) – Microbiome data with samples in rows and microbes in columns. If associations is None this becomes a required parameter. Microbe names need to match the organism node names in graph and indices need to match the indices of sample_groups.

  • kwargs – Keyword arguments to be passed to compute_multiomics_associations()

Examples

>>> from pymantra.datasets import example_multiomics_enrichment_data
>>> from pymantra import (
...     compute_reaction_estimates, compute_multiomics_associations)
>>> metabolite_data, microbiome_data, sample_groups, graph =     ...     example_multiomics_enrichment_data()
>>> residuals =     ...     compute_reaction_estimates(graph, metabolite_data, sample_groups)
>>> corrs, pvals = compute_multiomics_associations(
...     residuals, microbiome_data, sample_groups)
>>> add_microbiome_associations(graph, sample_groups, corrs)
pymantra.network.add_gene_associations(graph: DiGraph, sample_groups: Series, associations: Dict[str, DataFrame] | None = None, residuals: DataFrame | None = None, gene_data: DataFrame | None = None, **kwargs)[source]#

Add microbiome-reaction associations to a multi-omics graph

Add the association estimates to a given multi-omics-reaction graph, either using pre-computed estimates or computing estimates via compute_multiomics_associations() and adding them directly.

Parameters:
  • graph (nx.DiGraph) – Metabolite-reaction graph containing additional reaction-organism connections. Usually when calling this function reaction estimates are already added to the graph.

  • sample_groups (pd.Series) – Array indicating sample group

  • associations (Dict[str, pd.DataFrame], optional) – Reaction-microbiome associations per group as generated by compute_multiomics_associations()

  • residuals (pd.DataFrame) – Linear-model estimates per reaction and sample as generated by compute_reaction_estimates(). If associations is None this parameter is required.

  • gene_data (pd.DataFrame) – Gene data with samples in rows and genes in columns. If associations is None this becomes a required parameter. Gene names need to match the gene node names in graph and indices need to match the indices of sample_groups.

  • kwargs – Keyword arguments to be passed to compute_multiomics_associations()

Graph Generation#

These functions are used to prepare a mantra-formatted graph like one generated by the NetworkGenerator class for reaction activity estimation

pymantra.network.reaction_graph_extraction(graph: Graph, include_attributes: bool = True)#

Extract the reaction-reaction graph from a metabolite-reaction graph

Parameters:
  • graph (nx.Graph | nx.DiGraph) – Bipartite metabolite-reaction from which reaction-reaction connections are extracted

  • include_attributes (bool, True) – Whether to include no

Returns:

reaction-reaction edges as a set of 2-tuples. Reactions are always represented as strings

Return type:

Set[Tuple[str, str]]

Classes#

class pymantra.LocalSearch(network: Graph, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, objective_function: str, min_reactions: int, p: float = 2, seed_size=10, *args, **kwargs)[source]

Bases: ABC

Local search base class

Interface for running local search with (predefined) objective functions using pre-computed node and edge values.

Usually these values are coming from reaction activity approximation and the approximation of reaction/metabolite associations to other omics entities approximation, as implemented in pymantra.network.compute_reaction_estimates() and pymantra.network.compute_multiomics_associations().

If you want to use other node/edge metrics, you need to set/overwrite the values stored as ‘data’. These values are used by the pre-defined objective functions without any further checks or corrections.

If you are intending to use this function with a graph generated manually (i.e. with functions outside this module) it must contain the following node attributes: * ‘data’ or ‘vec_data’ * ‘node_type’

and the following edge attributes: * ‘data’ * ‘edge_type’

lso

Interface to the c++ class handling the local search

Type:

pymantra.network.enrichment.lso.LocalSearchOptimization

set_l_min(self, lmin: int)[source]#
set_min_reactions(self, lmin: int)[source]#
set_l_max(self, lmin: int)[source]#
set_temp(self, lmin: int)[source]#
set_max_iter(self, lmin: int)[source]#
plot_score_progression(self, ax=None)[source]#
__init__(network: Graph, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, objective_function: str, min_reactions: int, p: float = 2, seed_size=10, *args, **kwargs)[source]

Initialize a LocalSearchOptimization object

Regarding the local search objective it is possible to adapt or add new objective functions in general, but currently requires C++ functions and re-compilation.

The current implemented objective functions are

  1. ‘metabolic_reactions’: ‘ld_reactions’

  2. ‘reaction_microbe’: ‘reaction_microbiome’

  3. ‘reaction_transcriptome’: ‘reaction_transcriptome’

Parameters:
  • network (nx.Graph) –

    Reaction network on which the local search should be computed.

    The graph is assumed to have the following node attributes:
    • ’data’

    • ’node_type’

    and the following edge attributes:
    • ’data’

    • ’edge_type’

    Usually network will be computed using pymantra.databases.NetworkGenerator

  • temp (float) – Initial simulated annealing temperature, exponentially decaying every iteration. The higher temp the more likely it is to a solution with a lower score at any iteration. Intuitively more ‘hops’ will be performed at higher temperature.

  • delta_min (float) – Minimum improvement per iteration

  • l_min (int) – Minimal solution size

  • l_max (int) – Maximal solution size

  • max_iter (int) – Maximum number of iterations before local search is stopped, if the (sub)optimal results has not been found

  • objective_function (str) – Which objective functions to use. Possible options are currently: “ld_reactions”, “reaction_microbiome”, “reaction_gene” and “precomputed_objectives”

  • min_reactions (int) – Minimum number of reactions to be contained in the solution. This has no effect for metabolomics-only experiments.

  • p (Optional[float], default 2.) – Which :math:`L^p`(Minkowski)-norm to use in the objective function. Might not be relevant for all objective functions

non_empty_results() bool[source]

Check whether both solution and score progression are not empty

run_local_search(groups: Tuple[str, str] | None = None, n_threads: int = 1, seed_size: int | None = None, min_comp_size: int = 25)[source]

Run a local search

Do a local search optimization with parameters given by the instance’s attributes for the binary comparison specified by groups

Parameters:
  • groups (Tuple[str, str]) – 2-tuple containing the group names

  • n_threads (int, default 1) – Number of threads to use. NOTE: currently multi-threading can cause unexpected termination

  • seed_size (int, optional) – Option to specify the seed size. If None, the seed_size at LocalSearch object initialization is used.

  • min_comp_size (int, default 25) – Minimum component size to run a local search on the component

run_repeated_local_search(n_repeats: int, groups: Tuple[str, str] | None = None, combine_mode: str = 'union', n_threads: int = 1, seed_size: int | None = None, min_comp_size: int = 25)[source]

Run a local search repeatedly n-times

Do a local search optimization with parameters given by the instance’s attributes for the binary comparison specified by groups n times to get a more robust result.

Before each repetition a new random seed will be set.

Results from the different repeats will be merged either by the union or intersection of the nodes contained in their subgraph-solutions.

Parameters:
  • n_repeats (int) – Number of repeated local searches to perform

  • groups (Optional[Tuple[str, str]]) – 2-tuple containing the group names

  • combine_mode (str, "union") – How to combine the results of all iterations. Either “union” or “intersection”

  • n_threads (int, default 1) – Number of threads to use. NOTE: currently multi-threading can cause unexpected termination

  • seed_size (int, optional) – Option to specify the seed size. If None, the seed_size at LocalSearch object initialization is used.

  • min_comp_size (int, default 25) – Minimum component size to run a local search on the component

score_final_solution(groups: Tuple[str, str]) float[source]

Recompute the score of the final solution

Recomputes the score fo the final subnetwork with the given groups. If groups contains the same groups as during the local search, the result will be equal to the score in solution. However, this function also enables the calculation of the objective score with other group combinations and the computed solution.

Parameters:

groups (Tuple[str, str]) – 2-tuple containing the group names

Returns:

The objective function score

Return type:

float

set_seed(seed: str, seed_size: int | None = None)[source]

Set the seed at which local search is starting

‘Manually’ setting the local search seed. When calling run_local_search() a seed is automatically computed and cached for re-usage when running another local search. If you want multiple local search runs with independent seeds this method should be called in between runs.

Parameters:
  • seed (Union[List[str], Set[str], str]) – Either a node (specified by name) or an iterable collection of nodes. If given a single node seed_size must be set to specify the number of neighbours of the seed node to draw. Otherwise, the size of the collection must bei in the range of [l_min, l_max]

  • seed_size (Optional[int]) – Size of the seed subgraph. Required if seed is a single node, otherwise ignored.

property converged: bool

Whether the local search converged or terminated due to reaching max_iter

property delta_min: float

Current choice of minimum progress per iteration

property l_max: int

Current choice of maximum solution size

property l_min: int

Current choice of minimum solution size

property max_iter: int

Current choice of maximum number of iteration for local search

property min_reactions: int

Current choice of minimum number of reactions in the solution

property p: float

Current choice of lp-norm

property temp: float

Current choice of simulated annealing temperature

class pymantra.MetaboliteLocalSearch(network: Graph, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, p: float = 2, objective_function='ld_reactions', is_reaction_graph: bool = False, **kwargs)[source]#

Local search class for metabolomics-data only

Interface for running local search for metabolomics-only data with (predefined) objective functions using pre-computed node values.

Usually these values are coming from reaction activity approximation as implemented in pymantra.network.compute_reaction_estimates().

If you want to use other node/edge metrics, you need to set/overwrite the values stored as ‘data’. These values are used by the pre-defined objective functions without any further checks or corrections.

If you are intending to use this function with a graph generated manually (i.e. with functions outside this module) it must contain the following node attributes: * ‘data’ or ‘vec_data’ * ‘node_type’

and the following edge attributes: * ‘data’ * ‘edge_type’

lso#

Interface to the c++ class handling the local search

Type:

LocalSearchOptimization

reaction_edges#

reaction-reaction edges either passed by or extracted from the graph passed to the constructor

Type:

Set[Tuple[str, str]]

__init__(network: Graph, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, p: float = 2, objective_function='ld_reactions', is_reaction_graph: bool = False, **kwargs)[source]#

Initialize a MetaboliteLocalSearch object

Initializes a specialized subclass of LocalSearchOptimization meant to compute a local search on metabolite-reaction graphs without additional node types.

Parameters:
  • network (nx.Graph) – Either a metabolite-reaction graph or a reaction-reaction graph extracted from a metabolite-reaction graph. Usually network will be computed using pymantra.database.NetworkGenerator and/or reaction_graph_extraction()

  • temp (float) – Initial simulated annealing temperature, exponentially decaying every iteration. The higher temp the more likely it is to a solution with a lower score at any iteration. Intuitively more ‘hops’ will be performed at higher temperature.

  • delta_min (float) – Minimum improvement per iteration

  • l_min (int) – Minimal solution size

  • l_max (int) – Maximal solution size

  • max_iter (int) – Maximum number of iterations before local search is stopped, if the (sub)optimal results has not been found

  • p (Optional[float], default 2.) – Which :math:`L^p`(Minkowski)-norm to use in the objective function.

  • objective_function (Optional[str], default 'metabolic_reactions') – Currently changing this option is not supported

  • is_reaction_graph (Optional[bool], default False) – Whether network is a reaction-reaction or a metabolite-reaction graph

  • kwargs – Keyword arguments to pass to :class:LocalSearchOptimization`. Note that “min_reactions” is automatically set to l_min, passing it will have no effect.

plot_subnetwork(network: DiGraph | None = None, subplot_args: dict | None = None, **kwargs) axis | Tuple[figure, ndarray | List[axis]][source]#

Plot the subgraph returned by local search optimization

Plotting the edge subgraph returned by local search optimization. Either plots a reaction-reaction-organism graph containing exactly the edges in solution or a metabolite-reaction network containing all nodes solution and their metabolite neighbours.

Parameters:
  • network (nx.DiGraph, optional) – Same metabolite-reaction graph used as input in the constructor. It is assumed to only contain directed metabolite-reaction edges.

  • subplot_args (dict, optional) – Keyword arguments to pass to plt.subplots.

  • kwargs – Optional keyword arguments passed to pymantra.plotting.plot_undirected_graph() or pymantra.plotting.plot_directed_graph() depending on whether network is None. Note that reaction_graph cannot be passed.

Returns:

Either a single matplotlib axis object, if ‘ax’ is given as a keyword argument or a 2-tuple with the first element being the figure on which the subplots are lying and the second a list or array of plt.axis which are drawn in the figure.

Return type:

Union[plt.axis, Tuple[plt.figure, Union[np.ndarray, List[plt.axis]]]]

class pymantra.MultiOmicsLocalSearch(network: Graph, omics: str, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, min_reactions: int, p: float = 2, is_reaction_graph: bool = False, **kwargs)[source]#

Local search class for metabolomics with multi-omics data

Interface for running local search with (predefined) objective functions using pre-computed node and edge values.

Usually these values are coming from reaction activity approximation and the approximation of reaction/metabolite associations to other omics entities approximation, as implemented in pymantra.network.compute_reaction_estimates() and pymantra.network.compute_multiomics_associations().

If you want to use other node/edge metrics, you need to set/overwrite the values stored as ‘data’. These values are used by the pre-defined objective functions without any further checks or corrections.

If you are intending to use this function with a graph generated manually (i.e. with functions outside this module) it must contain the following node attributes: * ‘data’ or vec_data * ‘node_type’

and the following edge attributes: * ‘data’ * ‘edge_type’

lso#

Interface to the c++ class handling the local search

Type:

LocalSearchOptimization

reaction_multiomics_edges#

reaction-reaction and multi omics-reactions edges either passed by or extracted from the graph passed to the constructor

Type:

Set[Tuple[str, str]]

__init__(network: Graph, omics: str, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, min_reactions: int, p: float = 2, is_reaction_graph: bool = False, **kwargs)[source]#

Initialize a MetaboliteLocalSearch object

Initializes a specialized subclass of LocalSearchOptimization meant to compute a local search on metabolite-reaction graphs without additional node types.

Parameters:
  • network (nx.Graph) – Either a metabolite-reaction graph or a reaction-reaction graph extracted from a metabolite-reaction graph. Usually network will be computed using NetworkGenerator and/or reaction_graph_extraction()

  • omics (str) – Type of multi-omics association. Currently, this must be either “organism” (for microbiome) or “gene” (for transcriptome or metagenome)

  • temp (float) – Initial simulated annealing temperature, exponentially decaying every iteration. The higher temp the more likely it is to a solution with a lower score at any iteration. Intuitively more ‘hops’ will be performed at higher temperature.

  • delta_min (float) – Minimum improvement per iteration

  • l_min (int) – Minimal solution size

  • l_max (int) – Maximal solution size

  • max_iter (int) – Maximum number of iterations before local search is stopped, if the (sub)optimal results has not been found

  • min_reactions (int) – Minimum number of reactions to be contained in the solution.

  • p (float, default 2.) – Which :math:`L^p`(Minkowski)-norm to use in the objective function.

  • is_reaction_graph (bool, default False) – Whether network is a reaction-reaction or a metabolite-reaction graph

  • kwargs – Keyword arguments to pass to :class:LocalSearchOptimization`

plot_subnetwork(network: DiGraph | None = None, subplot_args: dict | None = None, **kwargs) Tuple[figure, ndarray | List[axis]][source]#

Plot the subgraph returned by local search optimization

Plotting the edge subgraph returned by local search optimization. Either plots a reaction-reaction-organism graph containing exactly the edges in solution or a metabolite-reaction network containing all nodes solution and their metabolite neighbours.

Parameters:
  • network (nx.DiGraph, optional) – Same metabolite-reaction graph used as input in the constructor. It is assumed to only contain directed metabolite-reaction edges.

  • subplot_args (dict, optional) – Keyword arguments to pass to plt.subplots.

  • kwargs – Optional keyword arguments passed to pymantra.plotting.plot_undirected_graph() or pymantra.plotting.plot_directed_graph() depending on whether network is None. Note that reaction_graph cannot be passed.

Returns:

Either a single matplotlib axis object, if ‘ax’ is given as a keyword argument or a 2-Tuple with the first element being the figure on which the subplots are lying and the second a list or array of plt.axis which are drawn in the figure.

Return type:

Union[plt.axis, Tuple[plt.figure, Union[np.ndarray, List[plt.axis]]]]

class pymantra.EnrichmentResults(subgraph: set, score: float, converged: bool)[source]#

Object holding local optimization results

subgraph#

Subnetwork with the best found objective function value

Type:

Set[int]

score#

Objective function value of the solution

Type:

float

converged#

currently unused Boolean indicating whether algorithm converged

Type:

bool

__init__(subgraph: set, score: float, converged: bool) None#
classmethod from_json(file: str | Path)[source]#

Read a previously computed enrichment result from a json file

Parameters:

file (Union[str, pathlib.Path]) – Path to the file containing the enrichment results

Return type:

EnrichmentResults

Additional Functions#

pymantra.network.per_sample_ld_estimation(graph: DiGraph, metabolome_data: DataFrame, groups: Series, covariates: DataFrame | None = None, compute_expl_var: bool = False, var_as_pval: bool = False, combined_models: bool = False, residual_summary: str = 'expl_var', scale: bool = True, control_group: str | None = None, r2_threshold: float = 0.5, recompute_non_passing: bool = False, outlier_threshold: float | None = None, random_effects: str | List[str] | None = None, lmm_args: dict | None = None, verbose: bool = False, **kwargs) Tuple[Dict[str, LinearModel], Dict[str, ndarray], Dict[str, Series]][source]#

Compute linear reaction-models

TODO

Parameters:
  • graph (nx.Graph) – Reaction-reaction graph

  • metabolome_data (pd.DataFrame) – Metabolome data with samples in rows and metabolites in columns

  • groups (pd.Series) – Sample group annotation

  • covariates (pd.DataFrame, optional) – Confounder variables to correct for. Correction is done using a Linear Mixed Model. All variables (i.e. columns) not specified as random effect variables in random_effects are assumed to be fixed effects variables. Generally variables should be numerical (float or integer). If you have categorical data as strings you can use the pandas.get_dummies function to encode them as integers. Make sure to use drop_first to avoid introducing collinearity see (https://stackoverflow.com/questions/31498390/how-to-get-pandas-get-dummies-to-emit-n-1-variables-to-avoid-collinearity) # noqa: E501 The correction currently only supports simple fixed and random effects inclusion. For more complex setups including factor interaction, it is recommended to do the correction beforehand and only pass the residuals to this function instead of the original metabolome data frame.

  • compute_expl_var (bool, False) – Whether to return 1 - explained variance of the model or the residuals

  • var_as_pval (bool, False) – Whether to return a p-value or a residual/explained variance value

  • combined_models (bool, False) – Whether to compute the reference linear model based on both groups or only the ‘control’

  • residual_summary (str, "expl_var") – Which method to use as residual summary statistic. Either “expl_var” for explained variance (RSS/TSS) or “norm” for p-norm

  • scale (bool, True) – Whether to z-score scale metabolites

  • control_group (str, optional) – Option to set which group should be viewed as the control. If None the first element in groups will be used.

  • r2_threshold (float, .5) – Minimum \(R^2\) value a method needs to achieve to be further considered

  • recompute_non_passing (bool, False) – Whether to recompute models with case data that failed to pass the R2 threshold for control data.

  • outlier_threshold (float, optional) – Threshold to remove outliers by Cook’s distance. If None a default on the basis of the survival function of a f-distribution is computed.

  • random_effects (str | List[str], optional) – Random effects for confounder correction. If covariates is None this has no effect. Else, this specifies which column(s) of covariates to include as random effects, all other columns will be included as fixed effects. If this is None, all columns of covariates are assumed to be fixed effects.

  • lmm_args (dict, optional) – Keyword arguments for MixedLM.from_formula(). Ignored unless covariates and random_effects are both not None.

  • verbose (bool, False) – If True, warnings will be raised whenever a model does not pass the R2 filter

  • kwargs – Optional keyword arguments passed to model computation TODO

Returns:

Control models, case residuals and all scaled residuals per reaction. If compute_expl_var is set to True the last element will contain the explained variance instead of the residuals.

Return type:

Tuple[Dict[str, LinearModel], Dict[str, np.ndarray], Dict[str, pd.Series]]

pymantra.network.spearmans_correlation(x: ndarray, y: ndarray, n_threads: int = 1)#

Compute the spearman’s correlation coefficient with a c++ backend.

nan values are automatically ignored and nan is returned if less than three non-nan observations are available for a pair of features.

If at least one 2D array is passed arrays will be returned, otherwise floats. If two 2D arrays of shape X x N and X x M are passed, the returned arrays will be of shape N x M.

Parameters:
  • x (np.ndarray) –

  • y (np.ndarray) –

  • n_threads (int) –

Returns:

NamedTuple of (1) correlations and (2) correlation-pvalues

Return type:

SpearmansResults

Exceptions#

class pymantra.network.NodeTypeError(message)[source]#
__init__(message)[source]#