Network Enrichment#
The network module handles all quantitative computations including estimation of reaction activity change, multi-omics reaction associations and optimal subgraph identification via local search.
Functions#
Model Handling#
The following functions are the main computations for estimating reaction changes and multi-omics associations
- pymantra.network.compute_reaction_estimates(graph: DiGraph, metabolite_data: DataFrame, sample_groups: Series, covariates: DataFrame | None = None, random_effects: str | List[str] | None = None, lmm_args: dict | None = None, residual_summary: str = 'expl_var', return_all: bool = False, control_group: any | None = None, **kwargs)[source]#
Generate reaction estimates
Compute the linear-model estimates for a given graph and metabolomics data
- Parameters:
graph (nx.DiGraph) – Metabolite-reaction graph. Metabolites need to be denoted as ‘metabolite’ (via node attribute ‘node_type’) and reactions as ‘reaction’.
metabolite_data (pd.DataFrame) – Metabolite data with samples in rows and metabolites in columns. Metabolite names need to match the metabolite node names in graph and indices need to match the indices of sample_groups.
sample_groups (pd.Series) – Array indicating sample groups
covariates (pd.DataFrame, optional) – Confounder variables to correct for. Correction is done using a Linear Mixed Model. All variables (i.e. columns) not specified as random effect variables in random_effects are assumed to be fixed effects variables. The correction currently only supports simple fixed and random effects inclusion. For more complex setups including factor interaction, it is recommended to do the correction beforehand and only pass the residuals to this function instead of the original metabolome data frame.
random_effects (str | List[str], optional) – Random effects for confounder correction. If covariates is None this has no effect. Else, this specifies which column(s) of covariates to include as random effects, all other columns will be included as fixed effects. If this is None, all columns of covariates are assumed to be fixed effects.
lmm_args (dict, optional) – Keyword arguments for
MixedLM.from_formula(). Ignored unless covariates and random_effects are both not None.residual_summary (str, "expl_var") – Which method to use as residual summary statistic. Either “expl_var” for explained variance (RSS/TSS) or “norm” for p-norm
return_all (bool, False) – Whether to return all variables return by
per_sample_ld_estimation()or only return the scaled residualscontrol_group (any, optional) – Name of the control group
kwargs – Keyword arguments. See
per_sample_ld_estimation()for details.
- Returns:
Union[ – pd.DataFrame, Tuple[Dict[str, LinearModel], Dict[str, np.ndarray], pd.DataFrame]
] – If control_groups is False a pd.DataFrame with samples as rows and reactions in columns. Else a 3-tuple as returned by
per_sample_ld_estimation()only with scaled_residuals as a pd.DataFrame generated from the initially returned dictionary
Examples
>>> from pymantra.datasets import example_metabolome_enrichment_data >>> metabolite_data, sample_groups, graph = ... example_metabolome_enrichment_data() >>> compute_reaction_estimates(graph, metabolite_data, sample_groups)
- pymantra.network.add_reaction_estimates(graph: DiGraph, sample_groups: Series | None = None, estimate_data: DataFrame | None = None, metabolite_data: DataFrame | None = None, control_group: any | None = None, return_estimates: bool = True, **kwargs)[source]#
Add reaction estimates to a metabolite-reaction graph
Add the linear model estimates to a given metabolite-reaction graph, either using pre-computed estimates or computing estimates via
compute_reaction_estimates()and adding them directly.- Parameters:
graph (nx.DiGraph) – Metabolite-reaction graph. Metabolites need to be denoted as ‘metabolite’ (via node attribute ‘node_type’) and reactions as ‘reaction’.
sample_groups (pd.Series) – Array indicating sample group
estimate_data (pd.DataFrame, optional) – Linear-model estimates per reaction and sample as generated by
compute_reaction_estimates(). If None, estimates will be computed from metabolite_data, hence must be given.metabolite_data (pd.DataFrame, optional) – Metabolite data with samples in rows and metabolites in columns. If estimate_data is None this becomes a required parameter as it will be used to compute the reaction models. Metabolite names need to match the metabolite node names in graph and indices need to match the indices of sample_groups.
control_group (any, optional) – Name of the control group
return_estimates (bool, False) – Whether to return the linear model-base estimates computed in
compute_reaction_estimates()kwargs – Keyword arguments. See
per_sample_ld_estimation()for details
Examples
>>> from pymantra.datasets import example_metabolome_enrichment_data >>> metabolite_data, sample_groups, graph = ... example_metabolome_enrichment_data() >>> residuals = ... compute_reaction_estimates(graph, metabolite_data, sample_groups) >>> add_reaction_estimates(graph, sample_groups, residuals)
- pymantra.network.compute_multiomics_associations(residuals: DataFrame, multi_omics: DataFrame, sample_groups: Series, covariates: DataFrame | None = None, random_effects: str | List[str] | None = None, lmm_args: dict | None = None, **kwargs)[source]#
Compute multi-omics associations with reaction estimates
Compute the associations between multi-omics features and the residuals of reactions as estimated by the linear models.
This is essentially a wrapper for
associate_multiomics_ld()for interface consistency.- Parameters:
residuals (pd.DataFrame) – Linear model residuals matrix with samples in rows and reactions in columns
multi_omics (pd.DataFrame) – Multi-omics measurements with samples in rows and multi-omics features in columns.
sample_groups (pd.Series) – Array of sample groups
covariates (pd.DataFrame, optional) – Confounder variables to correct for. Correction is done using a Linear Mixed Model. All variables (i.e. columns) not specified as random effect variables in random_effects are assumed to be fixed effects variables. The correction currently only supports simple fixed and random effects inclusion. For more complex setups including factor interaction, it is recommended to do the correction beforehand and only pass the residuals to this function instead of the original metabolome data frame.
random_effects (str | List[str], optional) – Random effects for confounder correction. If covariates is None this has no effect. Else, this specifies which column(s) of covariates to include as random effects, all other columns will be included as fixed effects. If this is None, all columns of covariates are assumed to be fixed effects.
lmm_args (dict, optional) – Keyword arguments for statsmodels.regression.mixed_linear_model.MixedLM.from_formula. Ignored unless covariates and random_effects are both not None.
kwargs – Keyword arguments passed to
associate_multiomics_ld()
- Returns:
2-tuple where the first elements are correlations per group as a data frame of multi-omics x reaction and the second element are the correlation p-values per group in the same format as the correlations
- Return type:
Examples
>>> from pymantra.datasets import example_multiomics_enrichment_data >>> from pymantra import ( ... compute_reaction_estimates, compute_multiomics_associations) >>> metabolite_data, microbiome_data, sample_groups, graph = ... example_multiomics_enrichment_data() >>> residuals = ... compute_reaction_estimates(graph, metabolite_data, sample_groups) >>> compute_multiomics_associations( ... residuals, microbiome_data, sample_groups)
- pymantra.network.add_microbiome_associations(graph: DiGraph, sample_groups: Series, associations: Dict[str, DataFrame] | None = None, residuals: DataFrame | None = None, microbiome_data: DataFrame | None = None, **kwargs)[source]#
Add microbiome-reaction associations to a multi-omics graph
Add the association estimates to a given multi-omics-reaction graph, either using pre-computed estimates or computing estimates via
compute_multiomics_associations()and adding them directly.- Parameters:
graph (nx.DiGraph) – Metabolite-reaction graph containing additional reaction-organism connections. Usually when calling this function reaction estimates are already added to the graph.
sample_groups (pd.Series) – Array indicating sample group
associations (Dict[str, pd.DataFrame], optional) – Reaction-microbiome associations per group as generated by
compute_multiomics_associations()residuals (pd.DataFrame) – Linear-model estimates per reaction and sample as generated by
compute_reaction_estimates(). If associations is None this parameter is required.microbiome_data (pd.DataFrame) – Microbiome data with samples in rows and microbes in columns. If associations is None this becomes a required parameter. Microbe names need to match the organism node names in graph and indices need to match the indices of sample_groups.
kwargs – Keyword arguments to be passed to
compute_multiomics_associations()
Examples
>>> from pymantra.datasets import example_multiomics_enrichment_data >>> from pymantra import ( ... compute_reaction_estimates, compute_multiomics_associations) >>> metabolite_data, microbiome_data, sample_groups, graph = ... example_multiomics_enrichment_data() >>> residuals = ... compute_reaction_estimates(graph, metabolite_data, sample_groups) >>> corrs, pvals = compute_multiomics_associations( ... residuals, microbiome_data, sample_groups) >>> add_microbiome_associations(graph, sample_groups, corrs)
- pymantra.network.add_gene_associations(graph: DiGraph, sample_groups: Series, associations: Dict[str, DataFrame] | None = None, residuals: DataFrame | None = None, gene_data: DataFrame | None = None, **kwargs)[source]#
Add microbiome-reaction associations to a multi-omics graph
Add the association estimates to a given multi-omics-reaction graph, either using pre-computed estimates or computing estimates via
compute_multiomics_associations()and adding them directly.- Parameters:
graph (nx.DiGraph) – Metabolite-reaction graph containing additional reaction-organism connections. Usually when calling this function reaction estimates are already added to the graph.
sample_groups (pd.Series) – Array indicating sample group
associations (Dict[str, pd.DataFrame], optional) – Reaction-microbiome associations per group as generated by
compute_multiomics_associations()residuals (pd.DataFrame) – Linear-model estimates per reaction and sample as generated by
compute_reaction_estimates(). If associations is None this parameter is required.gene_data (pd.DataFrame) – Gene data with samples in rows and genes in columns. If associations is None this becomes a required parameter. Gene names need to match the gene node names in graph and indices need to match the indices of sample_groups.
kwargs – Keyword arguments to be passed to
compute_multiomics_associations()
Graph Generation#
These functions are used to prepare a mantra-formatted graph like one
generated by the NetworkGenerator class for reaction activity estimation
- pymantra.network.reaction_graph_extraction(graph: Graph, include_attributes: bool = True)#
Extract the reaction-reaction graph from a metabolite-reaction graph
- Parameters:
graph (nx.Graph | nx.DiGraph) – Bipartite metabolite-reaction from which reaction-reaction connections are extracted
include_attributes (bool, True) – Whether to include no
- Returns:
reaction-reaction edges as a set of 2-tuples. Reactions are always represented as strings
- Return type:
Classes#
- class pymantra.LocalSearch(network: Graph, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, objective_function: str, min_reactions: int, p: float = 2, seed_size=10, *args, **kwargs)[source]
Bases:
ABCLocal search base class
Interface for running local search with (predefined) objective functions using pre-computed node and edge values.
Usually these values are coming from reaction activity approximation and the approximation of reaction/metabolite associations to other omics entities approximation, as implemented in
pymantra.network.compute_reaction_estimates()andpymantra.network.compute_multiomics_associations().If you want to use other node/edge metrics, you need to set/overwrite the values stored as ‘data’. These values are used by the pre-defined objective functions without any further checks or corrections.
If you are intending to use this function with a graph generated manually (i.e. with functions outside this module) it must contain the following node attributes: * ‘data’ or ‘vec_data’ * ‘node_type’
and the following edge attributes: * ‘data’ * ‘edge_type’
- lso
Interface to the c++ class handling the local search
- Type:
pymantra.network.enrichment.lso.LocalSearchOptimization
- __init__(network: Graph, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, objective_function: str, min_reactions: int, p: float = 2, seed_size=10, *args, **kwargs)[source]
Initialize a LocalSearchOptimization object
Regarding the local search objective it is possible to adapt or add new objective functions in general, but currently requires C++ functions and re-compilation.
The current implemented objective functions are
‘metabolic_reactions’: ‘ld_reactions’
‘reaction_microbe’: ‘reaction_microbiome’
‘reaction_transcriptome’: ‘reaction_transcriptome’
- Parameters:
network (nx.Graph) –
Reaction network on which the local search should be computed.
- The graph is assumed to have the following node attributes:
’data’
’node_type’
- and the following edge attributes:
’data’
’edge_type’
Usually network will be computed using
pymantra.databases.NetworkGeneratortemp (float) – Initial simulated annealing temperature, exponentially decaying every iteration. The higher temp the more likely it is to a solution with a lower score at any iteration. Intuitively more ‘hops’ will be performed at higher temperature.
delta_min (float) – Minimum improvement per iteration
l_min (int) – Minimal solution size
l_max (int) – Maximal solution size
max_iter (int) – Maximum number of iterations before local search is stopped, if the (sub)optimal results has not been found
objective_function (str) – Which objective functions to use. Possible options are currently: “ld_reactions”, “reaction_microbiome”, “reaction_gene” and “precomputed_objectives”
min_reactions (int) – Minimum number of reactions to be contained in the solution. This has no effect for metabolomics-only experiments.
p (Optional[float], default 2.) – Which :math:`L^p`(Minkowski)-norm to use in the objective function. Might not be relevant for all objective functions
- run_local_search(groups: Tuple[str, str] | None = None, n_threads: int = 1, seed_size: int | None = None, min_comp_size: int = 25)[source]
Run a local search
Do a local search optimization with parameters given by the instance’s attributes for the binary comparison specified by groups
- Parameters:
groups (Tuple[str, str]) – 2-tuple containing the group names
n_threads (int, default 1) – Number of threads to use. NOTE: currently multi-threading can cause unexpected termination
seed_size (int, optional) – Option to specify the seed size. If None, the seed_size at
LocalSearchobject initialization is used.min_comp_size (int, default 25) – Minimum component size to run a local search on the component
- run_repeated_local_search(n_repeats: int, groups: Tuple[str, str] | None = None, combine_mode: str = 'union', n_threads: int = 1, seed_size: int | None = None, min_comp_size: int = 25)[source]
Run a local search repeatedly n-times
Do a local search optimization with parameters given by the instance’s attributes for the binary comparison specified by groups n times to get a more robust result.
Before each repetition a new random seed will be set.
Results from the different repeats will be merged either by the union or intersection of the nodes contained in their subgraph-solutions.
- Parameters:
n_repeats (int) – Number of repeated local searches to perform
groups (Optional[Tuple[str, str]]) – 2-tuple containing the group names
combine_mode (str, "union") – How to combine the results of all iterations. Either “union” or “intersection”
n_threads (int, default 1) – Number of threads to use. NOTE: currently multi-threading can cause unexpected termination
seed_size (int, optional) – Option to specify the seed size. If None, the seed_size at
LocalSearchobject initialization is used.min_comp_size (int, default 25) – Minimum component size to run a local search on the component
- score_final_solution(groups: Tuple[str, str]) float[source]
Recompute the score of the final solution
Recomputes the score fo the final subnetwork with the given groups. If groups contains the same groups as during the local search, the result will be equal to the score in
solution. However, this function also enables the calculation of the objective score with other group combinations and the computed solution.
- set_seed(seed: str, seed_size: int | None = None)[source]
Set the seed at which local search is starting
‘Manually’ setting the local search seed. When calling
run_local_search()a seed is automatically computed and cached for re-usage when running another local search. If you want multiple local search runs with independent seeds this method should be called in between runs.- Parameters:
seed (Union[List[str], Set[str], str]) – Either a node (specified by name) or an iterable collection of nodes. If given a single node seed_size must be set to specify the number of neighbours of the seed node to draw. Otherwise, the size of the collection must bei in the range of [l_min, l_max]
seed_size (Optional[int]) – Size of the seed subgraph. Required if seed is a single node, otherwise ignored.
- property converged: bool
Whether the local search converged or terminated due to reaching max_iter
- property delta_min: float
Current choice of minimum progress per iteration
- property l_max: int
Current choice of maximum solution size
- property l_min: int
Current choice of minimum solution size
- property max_iter: int
Current choice of maximum number of iteration for local search
- property min_reactions: int
Current choice of minimum number of reactions in the solution
- property p: float
Current choice of lp-norm
- property temp: float
Current choice of simulated annealing temperature
- class pymantra.MetaboliteLocalSearch(network: Graph, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, p: float = 2, objective_function='ld_reactions', is_reaction_graph: bool = False, **kwargs)[source]#
Local search class for metabolomics-data only
Interface for running local search for metabolomics-only data with (predefined) objective functions using pre-computed node values.
Usually these values are coming from reaction activity approximation as implemented in
pymantra.network.compute_reaction_estimates().If you want to use other node/edge metrics, you need to set/overwrite the values stored as ‘data’. These values are used by the pre-defined objective functions without any further checks or corrections.
If you are intending to use this function with a graph generated manually (i.e. with functions outside this module) it must contain the following node attributes: * ‘data’ or ‘vec_data’ * ‘node_type’
and the following edge attributes: * ‘data’ * ‘edge_type’
- lso#
Interface to the c++ class handling the local search
- Type:
LocalSearchOptimization
- reaction_edges#
reaction-reaction edges either passed by or extracted from the graph passed to the constructor
- __init__(network: Graph, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, p: float = 2, objective_function='ld_reactions', is_reaction_graph: bool = False, **kwargs)[source]#
Initialize a MetaboliteLocalSearch object
Initializes a specialized subclass of
LocalSearchOptimizationmeant to compute a local search on metabolite-reaction graphs without additional node types.- Parameters:
network (nx.Graph) – Either a metabolite-reaction graph or a reaction-reaction graph extracted from a metabolite-reaction graph. Usually network will be computed using
pymantra.database.NetworkGeneratorand/orreaction_graph_extraction()temp (float) – Initial simulated annealing temperature, exponentially decaying every iteration. The higher temp the more likely it is to a solution with a lower score at any iteration. Intuitively more ‘hops’ will be performed at higher temperature.
delta_min (float) – Minimum improvement per iteration
l_min (int) – Minimal solution size
l_max (int) – Maximal solution size
max_iter (int) – Maximum number of iterations before local search is stopped, if the (sub)optimal results has not been found
p (Optional[float], default 2.) – Which :math:`L^p`(Minkowski)-norm to use in the objective function.
objective_function (Optional[str], default 'metabolic_reactions') – Currently changing this option is not supported
is_reaction_graph (Optional[bool], default False) – Whether network is a reaction-reaction or a metabolite-reaction graph
kwargs – Keyword arguments to pass to :class:LocalSearchOptimization`. Note that “min_reactions” is automatically set to l_min, passing it will have no effect.
- plot_subnetwork(network: DiGraph | None = None, subplot_args: dict | None = None, **kwargs) axis | Tuple[figure, ndarray | List[axis]][source]#
Plot the subgraph returned by local search optimization
Plotting the edge subgraph returned by local search optimization. Either plots a reaction-reaction-organism graph containing exactly the edges in
solutionor a metabolite-reaction network containing all nodessolutionand their metabolite neighbours.- Parameters:
network (nx.DiGraph, optional) – Same metabolite-reaction graph used as input in the constructor. It is assumed to only contain directed metabolite-reaction edges.
subplot_args (dict, optional) – Keyword arguments to pass to plt.subplots.
kwargs – Optional keyword arguments passed to
pymantra.plotting.plot_undirected_graph()orpymantra.plotting.plot_directed_graph()depending on whether network is None. Note that reaction_graph cannot be passed.
- Returns:
Either a single matplotlib axis object, if ‘ax’ is given as a keyword argument or a 2-tuple with the first element being the figure on which the subplots are lying and the second a list or array of plt.axis which are drawn in the figure.
- Return type:
Union[plt.axis, Tuple[plt.figure, Union[np.ndarray, List[plt.axis]]]]
- class pymantra.MultiOmicsLocalSearch(network: Graph, omics: str, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, min_reactions: int, p: float = 2, is_reaction_graph: bool = False, **kwargs)[source]#
Local search class for metabolomics with multi-omics data
Interface for running local search with (predefined) objective functions using pre-computed node and edge values.
Usually these values are coming from reaction activity approximation and the approximation of reaction/metabolite associations to other omics entities approximation, as implemented in
pymantra.network.compute_reaction_estimates()andpymantra.network.compute_multiomics_associations().If you want to use other node/edge metrics, you need to set/overwrite the values stored as ‘data’. These values are used by the pre-defined objective functions without any further checks or corrections.
If you are intending to use this function with a graph generated manually (i.e. with functions outside this module) it must contain the following node attributes: * ‘data’ or vec_data * ‘node_type’
and the following edge attributes: * ‘data’ * ‘edge_type’
- lso#
Interface to the c++ class handling the local search
- Type:
LocalSearchOptimization
- reaction_multiomics_edges#
reaction-reaction and multi omics-reactions edges either passed by or extracted from the graph passed to the constructor
- __init__(network: Graph, omics: str, temp: float, delta_min: float, l_min: int, l_max: int, max_iter: int, min_reactions: int, p: float = 2, is_reaction_graph: bool = False, **kwargs)[source]#
Initialize a MetaboliteLocalSearch object
Initializes a specialized subclass of
LocalSearchOptimizationmeant to compute a local search on metabolite-reaction graphs without additional node types.- Parameters:
network (nx.Graph) – Either a metabolite-reaction graph or a reaction-reaction graph extracted from a metabolite-reaction graph. Usually network will be computed using
NetworkGeneratorand/orreaction_graph_extraction()omics (str) – Type of multi-omics association. Currently, this must be either “organism” (for microbiome) or “gene” (for transcriptome or metagenome)
temp (float) – Initial simulated annealing temperature, exponentially decaying every iteration. The higher temp the more likely it is to a solution with a lower score at any iteration. Intuitively more ‘hops’ will be performed at higher temperature.
delta_min (float) – Minimum improvement per iteration
l_min (int) – Minimal solution size
l_max (int) – Maximal solution size
max_iter (int) – Maximum number of iterations before local search is stopped, if the (sub)optimal results has not been found
min_reactions (int) – Minimum number of reactions to be contained in the solution.
p (float, default 2.) – Which :math:`L^p`(Minkowski)-norm to use in the objective function.
is_reaction_graph (bool, default False) – Whether network is a reaction-reaction or a metabolite-reaction graph
kwargs – Keyword arguments to pass to :class:LocalSearchOptimization`
- plot_subnetwork(network: DiGraph | None = None, subplot_args: dict | None = None, **kwargs) Tuple[figure, ndarray | List[axis]][source]#
Plot the subgraph returned by local search optimization
Plotting the edge subgraph returned by local search optimization. Either plots a reaction-reaction-organism graph containing exactly the edges in
solutionor a metabolite-reaction network containing all nodessolutionand their metabolite neighbours.- Parameters:
network (nx.DiGraph, optional) – Same metabolite-reaction graph used as input in the constructor. It is assumed to only contain directed metabolite-reaction edges.
subplot_args (dict, optional) – Keyword arguments to pass to plt.subplots.
kwargs – Optional keyword arguments passed to
pymantra.plotting.plot_undirected_graph()orpymantra.plotting.plot_directed_graph()depending on whether network is None. Note that reaction_graph cannot be passed.
- Returns:
Either a single matplotlib axis object, if ‘ax’ is given as a keyword argument or a 2-Tuple with the first element being the figure on which the subplots are lying and the second a list or array of plt.axis which are drawn in the figure.
- Return type:
Union[plt.axis, Tuple[plt.figure, Union[np.ndarray, List[plt.axis]]]]
- class pymantra.EnrichmentResults(subgraph: set, score: float, converged: bool)[source]#
Object holding local optimization results
Additional Functions#
- pymantra.network.per_sample_ld_estimation(graph: DiGraph, metabolome_data: DataFrame, groups: Series, covariates: DataFrame | None = None, compute_expl_var: bool = False, var_as_pval: bool = False, combined_models: bool = False, residual_summary: str = 'expl_var', scale: bool = True, control_group: str | None = None, r2_threshold: float = 0.5, recompute_non_passing: bool = False, outlier_threshold: float | None = None, random_effects: str | List[str] | None = None, lmm_args: dict | None = None, verbose: bool = False, **kwargs) Tuple[Dict[str, LinearModel], Dict[str, ndarray], Dict[str, Series]][source]#
Compute linear reaction-models
TODO
- Parameters:
graph (nx.Graph) – Reaction-reaction graph
metabolome_data (pd.DataFrame) – Metabolome data with samples in rows and metabolites in columns
groups (pd.Series) – Sample group annotation
covariates (pd.DataFrame, optional) – Confounder variables to correct for. Correction is done using a Linear Mixed Model. All variables (i.e. columns) not specified as random effect variables in random_effects are assumed to be fixed effects variables. Generally variables should be numerical (float or integer). If you have categorical data as strings you can use the pandas.get_dummies function to encode them as integers. Make sure to use drop_first to avoid introducing collinearity see (https://stackoverflow.com/questions/31498390/how-to-get-pandas-get-dummies-to-emit-n-1-variables-to-avoid-collinearity) # noqa: E501 The correction currently only supports simple fixed and random effects inclusion. For more complex setups including factor interaction, it is recommended to do the correction beforehand and only pass the residuals to this function instead of the original metabolome data frame.
compute_expl_var (bool, False) – Whether to return 1 - explained variance of the model or the residuals
var_as_pval (bool, False) – Whether to return a p-value or a residual/explained variance value
combined_models (bool, False) – Whether to compute the reference linear model based on both groups or only the ‘control’
residual_summary (str, "expl_var") – Which method to use as residual summary statistic. Either “expl_var” for explained variance (RSS/TSS) or “norm” for p-norm
scale (bool, True) – Whether to z-score scale metabolites
control_group (str, optional) – Option to set which group should be viewed as the control. If None the first element in groups will be used.
r2_threshold (float, .5) – Minimum \(R^2\) value a method needs to achieve to be further considered
recompute_non_passing (bool, False) – Whether to recompute models with case data that failed to pass the R2 threshold for control data.
outlier_threshold (float, optional) – Threshold to remove outliers by Cook’s distance. If None a default on the basis of the survival function of a f-distribution is computed.
random_effects (str | List[str], optional) – Random effects for confounder correction. If covariates is None this has no effect. Else, this specifies which column(s) of covariates to include as random effects, all other columns will be included as fixed effects. If this is None, all columns of covariates are assumed to be fixed effects.
lmm_args (dict, optional) – Keyword arguments for
MixedLM.from_formula(). Ignored unless covariates and random_effects are both not None.verbose (bool, False) – If True, warnings will be raised whenever a model does not pass the R2 filter
kwargs – Optional keyword arguments passed to model computation TODO
- Returns:
Control models, case residuals and all scaled residuals per reaction. If compute_expl_var is set to True the last element will contain the explained variance instead of the residuals.
- Return type:
Tuple[Dict[str, LinearModel], Dict[str, np.ndarray], Dict[str, pd.Series]]
- pymantra.network.spearmans_correlation(x: ndarray, y: ndarray, n_threads: int = 1)#
Compute the spearman’s correlation coefficient with a c++ backend.
nan values are automatically ignored and nan is returned if less than three non-nan observations are available for a pair of features.
If at least one 2D array is passed arrays will be returned, otherwise floats. If two 2D arrays of shape X x N and X x M are passed, the returned arrays will be of shape N x M.
- Parameters:
x (np.ndarray) –
y (np.ndarray) –
n_threads (int) –
- Returns:
NamedTuple of (1) correlations and (2) correlation-pvalues
- Return type:
SpearmansResults