Database Module#

The database module handles the generation of metabolic networks from a given list of metabolites and optionally genes and/or microbial species.

Depending on whether the database is accessed through the provided REST framework or directly through neo4j (regardless of whether you are using neo4j in a docker container or outside) different query classes should be used.

For the REST API, please use APINetworkGenerator, for neo4j queries NetworkGenerator.

Online/API Database Class#

class pymantra.database.APINetworkGenerator(base_url: str = 'https://exbio.wzw.tum.de/pymantradb')[source]#

API mirror for NetworkGenerator

Querying the mantra online neo4j database containing the reference network generated with the Neo4jGenerator class.

Most query functions depend on the requirements test with the Verifier class. To ensure that all functions work as expected, only databases test for their correctness should be used.

url#

Base URL to where requests go

Type:

str

__init__(base_url: str = 'https://exbio.wzw.tum.de/pymantradb')[source]#

Initialize a new APINetworkGenerator instance

Initialize a new instance to run queries to the neo4j mantra-db API.

Parameters:

base_url (str, https://exbio.wzw.tum.de/pymantradb) – Set the root URL where the server is located

as_networkx(nodes: Dict[str, Set[str]] | None = None, edges: Dict[str, Set[Edge]] | None = None, include_attributes: bool = True, reaction_subgraph: bool = False, reduce: bool = True) DiGraph[source]#

Convert a set of nodes or edges to a networkx Graph

Parameters:
  • nodes (Dict[str, Set[str]], Optional) – Nodes to include by node type. Generally optional, but either nodes or edges need to be given. Please note: if edges is not specified, and reaction_subgraph is True reaction nodes given in nodes will NOT be considered.

  • edges (Dict[str, Set[Edge]], Optional) – Edges to include by edge type. Generally optional, but either nodes or edges need to be given. If not specified, edges will be queried from the database using the specified nodes using either get_subgraph or get_reaction_subgraph depending on include_attributes. The only edge attribute currently included is edge_type.

  • include_attributes (bool, default True) – If True, the nx.Graph.nodes contain the attributes specified in the database. Else node_type will be the only node attribute in the output graph. Please be aware that if True and edges are None, this might make the function much less efficient.

  • reaction_subgraph (bool, default False) – Only relevant if edges is None. If True subgraph edges queried result in a reaction subgraph (see get_reaction_subgraph()) else the subgraph will not contain reaction nodes (get_subgraph())

  • reduce (bool, False) – Whether to reduce the reaction nodes at the end of the

Returns:

  • nx.DiGraph – Subgraph as a nx.DiGraph

  • # TODO (add sample data)

Examples

>>> edges_ = {
>>>     EDGE_TYPE_NAMES['substrate']: {
>>>         # TODO: example edges
>>>     },
>>>     EDGE_TYPE_NAMES['product']: {
>>>         # TODO: example edges
>>>     }
>>> }
>>> generator = APINetworkGenerator()
>>> generator.as_networkx(edges=edges_)
get_reaction_subgraph(organisms: Set[str], genes: Set[str], metabolites: Set[str], reaction_organism: Tuple[str, str] | None = None) Dict[str, Set[Edge]][source]#

Extract the edges for a given set of entities

Query a subgraph with all genes, organisms and metabolites given and retain the original graph structure with reaction nodes.

Important: gene - organism, organism - reaction and gene - reaction edges are of opposite direction outside the database. The database structure is made to allow efficient queries, which do not reflect the ‘passing’ directions required for quantitative metabolic-network style analyses.

Parameters:
  • organisms (Set[str]) – A set of all organisms to be included in the subgraph. The names must correspond to the nodeLabel property in the database.

  • genes (Set[str]) – A set of all genes to be included in the subgraph. The names must correspond to the nodeLabel property in the database.

  • metabolites (Set[str]) – A set of all metabolites to be included in the subgraph. The names must correspond to the nodeLabel property in the database.

  • reaction_organism (Tuple[str, str], optional) – Specify an organism for which the metabolic reactions should be extracted as a 2-tuple of [ID type, ID], where ID type must be ‘Abbreviation_KEGG’ or ‘KeggID’ and ID the KEGG organism code or T number, respectively. For human this would thus either be [‘Abbreviation_KEGG’, ‘hsa’] or [‘KeggID’, ‘T01001’]. If organisms is not empty the specified organism will be added on top.

Returns:

A dictionary, where keys represent edge types as specified in utils.EDGE_TYPES pointing to a set of Edge representing all edges of the respective type contained in the subgraph.

Return type:

Dict[str, Set[Edge]]

Examples

>>> generator = APINetworkGenerator()
>>> metabos = {'FDMO3', 'h2o', 'FDMO2', 'fald', 'FDMO6', 'so3',
...            'FMNRx', 'nad', 'FMNRx2', 'fmn', 'nadp'}
>>> gs = {'1576', '1557', '1559'}
>>> orgs = {"Streptomyces tsukubensis", "Bacillus smithii",
...         "Streptomyces fulvissimus"}
>>> edges_ = generator.get_reaction_subgraph(orgs, gs, metabos)
get_subgraph(organisms: Set[str], genes: Set[str], metabolites: Set[str]) Dict[str, Set[Edge]][source]#

Returns a subgraph with all nodes given plus the reaction nodes required to connect them

Parameters:
  • organisms (Set[str]) – Set of all organisms to query The names must correspond to the nodeLabel property in the database.

  • genes (Set[str]) – Set of all genes to query The names must correspond to the nodeLabel property in the database.

  • metabolites (Set[str]) – Set of all metabolites to query The names must correspond to the nodeLabel property in the database.

Returns:

All connections between organisms, genes and metabolites contained in the database. organism - metabolite are third order connections (via gene and reaction nodes), all other connections are second order (via reaction nodes)

Return type:

Dict[str, Set[Edge]]

Examples

>>> generator = APINetworkGenerator("http://127.0.0.1:8084")
>>> metabos = {'FDMO3', 'h2o', 'FDMO2', 'fald', 'FDMO6', 'so3',
...            'FMNRx', 'nad', 'FMNRx2', 'fmn', 'nadp'}
>>> gs = {'1576', '1557', '1559'}
>>> orgs = {"Streptomyces tsukubensis", "Bacillus smithii",
...         "Streptomyces fulvissimus"}
>>> edges_ = generator.get_subgraph(orgs, gs, metabos)
verify_connection() bool[source]#

Check whether the given base URL is correct

Returns:

True if status code is 200 (connection verified)

Return type:

bool

Local Database Class#

class pymantra.database.NetworkGenerator(uri: str, auth: Tuple[str, str] | None = None, **kwargs)[source]#

Querying a neo4j database containing the reference network generated with the Neo4jGenerator class.

Most query functions depend on the requirements test with the Verifier class. To ensure that all functions work as expected, only databases test for their correctness should be used.

__init__(uri: str, auth: Tuple[str, str] | None = None, **kwargs)[source]#
Parameters:
  • uri (str) – database uri

  • auth (Tuple[str, str], Optional) – database credentials int the form of (user, password). If database is not secured pass None.

Raises:

ConnectionError – If database is not reachable with the given parameters

as_networkx(nodes: Dict[str, Set[str]] | None = None, edges: Dict[str, Set[Edge]] | None = None, include_attributes: bool = True, reaction_subgraph: bool = False, reduce: bool = True) DiGraph[source]#

Convert a set of nodes or edges to a networkx Graph

Parameters:
  • nodes (Dict[str, Set[str]], Optional) – Nodes to include by node type. Generally optional, but either nodes or edges need to be given. Please note: if edges is not specified, and reaction_subgraph is True reaction nodes given in nodes will NOT be considered.

  • edges (Dict[str, Set[Edge]], Optional) – Edges to include by edge type. Generally optional, but either nodes or edges need to be given. If not specified, edges will be queried from the database using the specified nodes using either get_subgraph or get_reaction_subgraph depending on include_attributes. The only edge attribute currently included is edge_type.

  • include_attributes (bool, default True) – If True, the nx.Graph.nodes contain the attributes specified in the database. Else node_type will be the only node attribute in the output graph. Please be aware that if True and edges are None, this might make the function much less efficient.

  • reaction_subgraph (bool, default False) – Only relevant if edges is None. If True subgraph edges queried result in a reaction subgraph (see get_reaction_subgraph()) else the subgraph will not contain reaction nodes (get_subgraph())

  • reduce (bool, False) – Whether to reduce the reaction nodes at the end of the

Returns:

  • nx.DiGraph – Subgraph as a nx.DiGraph

  • # TODO (add sample data)

Examples

>>> edges_ = {
>>>     EDGE_TYPE_NAMES['substrate']: {
>>>         # TODO: example edges
>>>     },
>>>     EDGE_TYPE_NAMES['product']: {
>>>         # TODO: example edges
>>>     }
>>> }
>>> generator = NetworkGenerator(
...     "bolt://localhost:7687", auth=('<user>', '<password>'))
>>> generator.as_networkx(edges=edges_)
get_all_edges(edge_type: str, limit: int | None = None) Set[Edge][source]#

Query all relationships of a specific type

Parameters:
  • edge_type (str) – Must be one of the elements in utils.EDGE_TYPES

  • limit (int, Optional) – If specified it represents the maximum number of edges to return, otherwise all edges are returned (default)

Returns:

All edges of the respective edge type represented as namedtuple of size with attributes source and target, which are both str of the respective nodeLabels. If no edges are found and empty set will be returned.

Return type:

Set[Edge]

get_gene_metabolite_connections(genes: Set[str], metabolites: Set[str])[source]#

Query all pairwise connections between genes and metabolites given as input

Parameters:
  • genes (Set[str]) – Set of all genes to query

  • metabolites (Set[str]) – Set of all metabolites to query

Returns:

Set of all gene - metabolite connections. Edges are assumed to have no direction

Return type:

Set[Edge]

get_metabolite_metabolite_connection(metabolites: Set[str])[source]#

Query all pairwise connections between metabolites given as input

Parameters:

metabolites (Set[str]) – Set of all metabolites to query

Returns:

Set of all metabolite - metabolite connections. Edges are assumed to have no direction

Return type:

Set[Edge]

get_node_attributes(node: str, node_type: str | None = None) Dict[str, any][source]#

Get the attributes of a specific node

Parameters:
  • node (str) – Name of the node (i.e. internal iD/species name)

  • node_type (str, optional) – Node type. Specifying this will speed up the computation, since the number of nodes filtered by neo4j are reduced

Returns:

Node attribute dictionary

Return type:

Dict[str, any]

get_node_by_id(node_id: int, as_string: bool = False) str | Node[source]#

Query a node by its ID

Parameters:
  • node_id (int) – Node ID to query

  • as_string (bool, Optional, default False) – If true the node label is returned, else the neo4j.graph.Node

Returns:

Node with the respective ID as str (nodeLabel) if as_string is True, else the neo4j.graph.Node

Return type:

Union[str, Node]

get_node_edges(node, node_type: str | None = None, **kwargs) Dict[str, Set[Edge]][source]#

Query all edges of a given node (by node label), irrespective of edge types

Parameters:
  • node (str) – Node label of the node to query

  • node_type (str, Optional) – Query results should be the same, since nodeLabels are supposed to be unique across all node types, however, speed might be different

  • kwargs – Optional keyword arguments

Returns:

All edges going out of or to the given input node by edge type

Return type:

Dict[str, Set[Edge]]

get_node_neighbours(node: str, node_type: str | None = None, as_strings: bool = False) Dict[str, Set[str]] | Dict[str, Set[Node]][source]#

Query all neighbours of a specific node by node label, irrespective of their node type

Parameters:
  • node (str) – Node label of the node to query

  • node_type (str, Optional) – Query node type. Query results should be the same, since nodeLabels are supposed to be unique across all node types, however, speed might be different

  • as_strings (bool, Optional, default False) – If True nodes will be returned as their nodeLabels, else as :obj:neo4j.graph.Node` objects.

Returns:

All direct neighbours by node type (dict.keys). If as_strings is True nodes are set of str, else set of neo4j.graph.Node.

If no neighbours are found and empty dict will be returned.

Return type:

Union[Dict[str, Set[str]], Dict[str, Set[Node]]]

get_organism_metabolite_connections(organisms: Set[str], metabolites: Set[str]) Set[Edge][source]#

Query all pairwise connections between organisms and metabolites given as input

Parameters:
  • organisms (Set[str]) – Set of all organisms to query

  • metabolites (Set[str]) – Set of all metabolites to query

Returns:

Set of all organism - metabolite connections. Edges are assumed to have no direction

Return type:

Set[Edge]

get_reaction_subgraph(organisms: Set[str], genes: Set[str], metabolites: Set[str], reaction_organism: Tuple[str, str] | None = None) Dict[str, Set[Edge]][source]#

Extract the edges for a given set of entities

Query a subgraph with all genes, organisms and metabolites given and retain the original graph structure with reaction nodes.

Important: gene - organism, organism - reaction and gene - reaction edges are of opposite direction outside the database. The database structure is made to allow efficient queries, which do not reflect the ‘passing’ directions required for quantitative metabolic-network style analyses.

Parameters:
  • organisms (Set[str]) – A set of all organisms to be included in the subgraph. The names must correspond to the nodeLabel property in the database.

  • genes (Set[str]) – A set of all genes to be included in the subgraph. The names must correspond to the nodeLabel property in the database.

  • metabolites (Set[str]) – A set of all metabolites to be included in the subgraph. The names must correspond to the nodeLabel property in the database.

  • reaction_organism (Tuple[str, str], optional) – Specify an organism for which the metabolic reactions should be extracted as a 2-tuple of [ID type, ID], where ID type must be ‘Abbreviation_KEGG’ or ‘KeggID’ and ID the KEGG organism code or T number, respectively. For human this would thus either be [‘Abbreviation_KEGG’, ‘hsa’] or [‘KeggID’, ‘T01001’]. If organisms is not empty the specified organism will be added on top.

Returns:

A dictionary, where keys represent edge types as specified in utils.EDGE_TYPES pointing to a set of Edge representing all edges of the respective type contained in the subgraph.

Return type:

Dict[str, Set[Edge]]

Examples

>>> generator = NetworkGenerator(
...     "bolt://localhost:7687", auth=('<user>', '<password>'))
>>> metabos = {'FDMO3', 'h2o', 'FDMO2', 'fald', 'FDMO6', 'so3',
>>>            'FMNRx', 'nad', 'FMNRx2', 'fmn', 'nadp'}
>>> gs = {'1576', '1557', '1559'}
>>> orgs = {"Streptomyces tsukubensis", "Bacillus smithii",
>>>         "Streptomyces fulvissimus"}
>>> edges_ = generator.get_reaction_subgraph(orgs, gs, metabos)
get_subgraph(organisms: Set[str], genes: Set[str], metabolites: Set[str]) Dict[str, Set[Edge]][source]#

Returns a subgraph with all nodes given plus the reaction nodes required to connect them

Parameters:
  • organisms (Set[str]) – Set of all organisms to query

  • genes (Set[str]) – Set of all genes to query

  • metabolites (Set[str]) – Set of all metabolites to query

Returns:

All connections between organisms, genes and metabolites contained in the database. organism - metabolite are third order connections (via gene and reaction nodes), all other connections are second order (via reaction nodes)

Return type:

Dict[str, Set[Edge]]

Examples

>>> generator = NetworkGenerator(
...     "bolt://localhost:7687", auth=('<user>', '<password>'))
>>> metabos = {'FDMO3', 'h2o', 'FDMO2', 'fald', 'FDMO6', 'so3',
>>>            'FMNRx', 'nad', 'FMNRx2', 'fmn', 'nadp'}
>>> gs = {'1576', '1557', '1559'}
>>> orgs = {"Streptomyces tsukubensis", "Bacillus smithii",
>>>         "Streptomyces fulvissimus"}
>>> edges_ = generator.get_subgraph(orgs, gs, metabos)
property edges: Dict[str, Set[str]] | Dict[str, Edge]#

Querying all nodes in the database

Returns:

All edges contained in the database by type.

Edges are either represented as strings of “source.nodeLabel -> target.nodeLabel” or as Edge (NamedTuple with attributes source at position 0 and target at position 1)

If no edges are found and empty dict will be returned.

Return type:

Union[Dict[str, Set[str]], Dict[str, Edge]]

property gene_ids#

Get all gene in the database by their ID and node label

Returns:

Dictionary of all genes in the database as ID, name pairs

Return type:

Dict[int, str]

property metabolite_ids#

Get all metabolites in the database by their ID and node label

Returns:

Dictionary of all metabolite in the database as ID, name pairs

Return type:

Dict[int, str]

property n_edges: Dict[str, int]#

Returning the number of relations per type

Returns:

Relation types are keys and counts values If no edges are found and empty dict will be returned.

Return type:

Dict[str, int]

property n_nodes: Dict[str, int]#

Returning the number of nodes per node type.

Returns:

Node types are keys and node counts are values If no nodes are found and empty dict will be returned.

Return type:

Dict[str, int]

property neo4j_nodes: Dict[str, Set[Node]]#
returns: nodeLabels of all nodes by node types, where nodes types are the

keys and the values are sets of neo4j.graph.Node. If no nodes are found and empty dict will be returned.

Return type:

Dict[str, Set[Node]]

property nodes: Dict[str, Set[str]] | Dict[str, Set[Node]]#

Querying all nodes in the database

Returns:

nodeLabels of all nodes by node types, where nodes types are the keys and the values are sets of strings

If no nodes are found and empty dict will be returned.

Return type:

Dict[str, Set[str]]

property organism_ids#

Get all organisms in the database by their ID and node label

Returns:

Dictionary of all organisms in the database as ID, name pairs

Return type:

Dict[int, str]

property reaction_ids#

Get all reactions in the database by their ID and node label

Returns:

Dictionary of all reactions in the database as ID, name pairs

Return type:

Dict[int, str]

Functions#

pymantra.database.reduce_reaction_nodes(graph: DiGraph)[source]#

Clean-up a metabolite-reaction graph

This functions merges reaction which have the same substrates and products (e.g. because they are coming from different organisms) and removes reaction nodes which have only one participant, i.e. transport reactions.

All modifications are done inplace.

This utility function is only recommended if you did not use the pymantra database module to generate your network, as the already applies this function internally.

Parameters:

graph (nx.DiGraph) – Directed metabolite-reaction graph, optionally with multi-omics reaction connections

Exceptions#

class pymantra.database.IncorrectEdgeType(message: str)[source]#

Exception for incorrect edge types. This can either mean that an invalid edge type is encountered or that an invalid number of edge types for a single edge was found

__init__(message: str)[source]#
class pymantra.database.IncorrectNodeType(message: str)[source]#

Exception for incorrect node types. This can either mean that an invalid node type is encountered or that an invalid number of node types for a single node was found

__init__(message: str)[source]#