Database Module#
The database module handles the generation of metabolic networks from a given list of metabolites and optionally genes and/or microbial species.
Depending on whether the database is accessed through the provided REST framework or directly through neo4j (regardless of whether you are using neo4j in a docker container or outside) different query classes should be used.
For the REST API, please use APINetworkGenerator, for neo4j queries
NetworkGenerator.
Online/API Database Class#
- class pymantra.database.APINetworkGenerator(base_url: str = 'https://exbio.wzw.tum.de/pymantradb')[source]#
API mirror for
NetworkGeneratorQuerying the mantra online neo4j database containing the reference network generated with the Neo4jGenerator class.
Most query functions depend on the requirements test with the Verifier class. To ensure that all functions work as expected, only databases test for their correctness should be used.
- __init__(base_url: str = 'https://exbio.wzw.tum.de/pymantradb')[source]#
Initialize a new APINetworkGenerator instance
Initialize a new instance to run queries to the neo4j mantra-db API.
- Parameters:
base_url (str, https://exbio.wzw.tum.de/pymantradb) – Set the root URL where the server is located
- as_networkx(nodes: Dict[str, Set[str]] | None = None, edges: Dict[str, Set[Edge]] | None = None, include_attributes: bool = True, reaction_subgraph: bool = False, reduce: bool = True) DiGraph[source]#
Convert a set of nodes or edges to a networkx Graph
- Parameters:
nodes (Dict[str, Set[str]], Optional) – Nodes to include by node type. Generally optional, but either nodes or edges need to be given. Please note: if edges is not specified, and reaction_subgraph is True reaction nodes given in nodes will NOT be considered.
edges (Dict[str, Set[Edge]], Optional) – Edges to include by edge type. Generally optional, but either nodes or edges need to be given. If not specified, edges will be queried from the database using the specified nodes using either get_subgraph or get_reaction_subgraph depending on include_attributes. The only edge attribute currently included is edge_type.
include_attributes (bool, default True) – If True, the nx.Graph.nodes contain the attributes specified in the database. Else node_type will be the only node attribute in the output graph. Please be aware that if True and edges are None, this might make the function much less efficient.
reaction_subgraph (bool, default False) – Only relevant if edges is None. If True subgraph edges queried result in a reaction subgraph (see
get_reaction_subgraph()) else the subgraph will not contain reaction nodes (get_subgraph())reduce (bool, False) – Whether to reduce the reaction nodes at the end of the
- Returns:
nx.DiGraph – Subgraph as a
nx.DiGraph# TODO (add sample data)
Examples
>>> edges_ = { >>> EDGE_TYPE_NAMES['substrate']: { >>> # TODO: example edges >>> }, >>> EDGE_TYPE_NAMES['product']: { >>> # TODO: example edges >>> } >>> } >>> generator = APINetworkGenerator() >>> generator.as_networkx(edges=edges_)
- get_reaction_subgraph(organisms: Set[str], genes: Set[str], metabolites: Set[str], reaction_organism: Tuple[str, str] | None = None) Dict[str, Set[Edge]][source]#
Extract the edges for a given set of entities
Query a subgraph with all genes, organisms and metabolites given and retain the original graph structure with reaction nodes.
Important: gene - organism, organism - reaction and gene - reaction edges are of opposite direction outside the database. The database structure is made to allow efficient queries, which do not reflect the ‘passing’ directions required for quantitative metabolic-network style analyses.
- Parameters:
organisms (Set[str]) – A set of all organisms to be included in the subgraph. The names must correspond to the nodeLabel property in the database.
genes (Set[str]) – A set of all genes to be included in the subgraph. The names must correspond to the nodeLabel property in the database.
metabolites (Set[str]) – A set of all metabolites to be included in the subgraph. The names must correspond to the nodeLabel property in the database.
reaction_organism (Tuple[str, str], optional) – Specify an organism for which the metabolic reactions should be extracted as a 2-tuple of [ID type, ID], where ID type must be ‘Abbreviation_KEGG’ or ‘KeggID’ and ID the KEGG organism code or T number, respectively. For human this would thus either be [‘Abbreviation_KEGG’, ‘hsa’] or [‘KeggID’, ‘T01001’]. If organisms is not empty the specified organism will be added on top.
- Returns:
A dictionary, where keys represent edge types as specified in utils.EDGE_TYPES pointing to a set of
Edgerepresenting all edges of the respective type contained in the subgraph.- Return type:
Examples
>>> generator = APINetworkGenerator() >>> metabos = {'FDMO3', 'h2o', 'FDMO2', 'fald', 'FDMO6', 'so3', ... 'FMNRx', 'nad', 'FMNRx2', 'fmn', 'nadp'} >>> gs = {'1576', '1557', '1559'} >>> orgs = {"Streptomyces tsukubensis", "Bacillus smithii", ... "Streptomyces fulvissimus"} >>> edges_ = generator.get_reaction_subgraph(orgs, gs, metabos)
- get_subgraph(organisms: Set[str], genes: Set[str], metabolites: Set[str]) Dict[str, Set[Edge]][source]#
Returns a subgraph with all nodes given plus the reaction nodes required to connect them
- Parameters:
organisms (Set[str]) – Set of all organisms to query The names must correspond to the nodeLabel property in the database.
genes (Set[str]) – Set of all genes to query The names must correspond to the nodeLabel property in the database.
metabolites (Set[str]) – Set of all metabolites to query The names must correspond to the nodeLabel property in the database.
- Returns:
All connections between organisms, genes and metabolites contained in the database. organism - metabolite are third order connections (via gene and reaction nodes), all other connections are second order (via reaction nodes)
- Return type:
Examples
>>> generator = APINetworkGenerator("http://127.0.0.1:8084") >>> metabos = {'FDMO3', 'h2o', 'FDMO2', 'fald', 'FDMO6', 'so3', ... 'FMNRx', 'nad', 'FMNRx2', 'fmn', 'nadp'} >>> gs = {'1576', '1557', '1559'} >>> orgs = {"Streptomyces tsukubensis", "Bacillus smithii", ... "Streptomyces fulvissimus"} >>> edges_ = generator.get_subgraph(orgs, gs, metabos)
Local Database Class#
- class pymantra.database.NetworkGenerator(uri: str, auth: Tuple[str, str] | None = None, **kwargs)[source]#
Querying a neo4j database containing the reference network generated with the Neo4jGenerator class.
Most query functions depend on the requirements test with the Verifier class. To ensure that all functions work as expected, only databases test for their correctness should be used.
- __init__(uri: str, auth: Tuple[str, str] | None = None, **kwargs)[source]#
- Parameters:
- Raises:
ConnectionError – If database is not reachable with the given parameters
- as_networkx(nodes: Dict[str, Set[str]] | None = None, edges: Dict[str, Set[Edge]] | None = None, include_attributes: bool = True, reaction_subgraph: bool = False, reduce: bool = True) DiGraph[source]#
Convert a set of nodes or edges to a networkx Graph
- Parameters:
nodes (Dict[str, Set[str]], Optional) – Nodes to include by node type. Generally optional, but either nodes or edges need to be given. Please note: if edges is not specified, and reaction_subgraph is True reaction nodes given in nodes will NOT be considered.
edges (Dict[str, Set[Edge]], Optional) – Edges to include by edge type. Generally optional, but either nodes or edges need to be given. If not specified, edges will be queried from the database using the specified nodes using either get_subgraph or get_reaction_subgraph depending on include_attributes. The only edge attribute currently included is edge_type.
include_attributes (bool, default True) – If True, the nx.Graph.nodes contain the attributes specified in the database. Else node_type will be the only node attribute in the output graph. Please be aware that if True and edges are None, this might make the function much less efficient.
reaction_subgraph (bool, default False) – Only relevant if edges is None. If True subgraph edges queried result in a reaction subgraph (see
get_reaction_subgraph()) else the subgraph will not contain reaction nodes (get_subgraph())reduce (bool, False) – Whether to reduce the reaction nodes at the end of the
- Returns:
nx.DiGraph – Subgraph as a
nx.DiGraph# TODO (add sample data)
Examples
>>> edges_ = { >>> EDGE_TYPE_NAMES['substrate']: { >>> # TODO: example edges >>> }, >>> EDGE_TYPE_NAMES['product']: { >>> # TODO: example edges >>> } >>> } >>> generator = NetworkGenerator( ... "bolt://localhost:7687", auth=('<user>', '<password>')) >>> generator.as_networkx(edges=edges_)
- get_all_edges(edge_type: str, limit: int | None = None) Set[Edge][source]#
Query all relationships of a specific type
- get_gene_metabolite_connections(genes: Set[str], metabolites: Set[str])[source]#
Query all pairwise connections between genes and metabolites given as input
- get_metabolite_metabolite_connection(metabolites: Set[str])[source]#
Query all pairwise connections between metabolites given as input
- get_node_attributes(node: str, node_type: str | None = None) Dict[str, any][source]#
Get the attributes of a specific node
- get_node_edges(node, node_type: str | None = None, **kwargs) Dict[str, Set[Edge]][source]#
Query all edges of a given node (by node label), irrespective of edge types
- Parameters:
- Returns:
All edges going out of or to the given input node by edge type
- Return type:
- get_node_neighbours(node: str, node_type: str | None = None, as_strings: bool = False) Dict[str, Set[str]] | Dict[str, Set[Node]][source]#
Query all neighbours of a specific node by node label, irrespective of their node type
- Parameters:
node (str) – Node label of the node to query
node_type (str, Optional) – Query node type. Query results should be the same, since nodeLabels are supposed to be unique across all node types, however, speed might be different
as_strings (bool, Optional, default False) – If True nodes will be returned as their nodeLabels, else as :obj:neo4j.graph.Node` objects.
- Returns:
All direct neighbours by node type (dict.keys). If as_strings is True nodes are
setofstr, elsesetofneo4j.graph.Node.If no neighbours are found and empty
dictwill be returned.- Return type:
- get_organism_metabolite_connections(organisms: Set[str], metabolites: Set[str]) Set[Edge][source]#
Query all pairwise connections between organisms and metabolites given as input
- get_reaction_subgraph(organisms: Set[str], genes: Set[str], metabolites: Set[str], reaction_organism: Tuple[str, str] | None = None) Dict[str, Set[Edge]][source]#
Extract the edges for a given set of entities
Query a subgraph with all genes, organisms and metabolites given and retain the original graph structure with reaction nodes.
Important: gene - organism, organism - reaction and gene - reaction edges are of opposite direction outside the database. The database structure is made to allow efficient queries, which do not reflect the ‘passing’ directions required for quantitative metabolic-network style analyses.
- Parameters:
organisms (Set[str]) – A set of all organisms to be included in the subgraph. The names must correspond to the nodeLabel property in the database.
genes (Set[str]) – A set of all genes to be included in the subgraph. The names must correspond to the nodeLabel property in the database.
metabolites (Set[str]) – A set of all metabolites to be included in the subgraph. The names must correspond to the nodeLabel property in the database.
reaction_organism (Tuple[str, str], optional) – Specify an organism for which the metabolic reactions should be extracted as a 2-tuple of [ID type, ID], where ID type must be ‘Abbreviation_KEGG’ or ‘KeggID’ and ID the KEGG organism code or T number, respectively. For human this would thus either be [‘Abbreviation_KEGG’, ‘hsa’] or [‘KeggID’, ‘T01001’]. If organisms is not empty the specified organism will be added on top.
- Returns:
A dictionary, where keys represent edge types as specified in utils.EDGE_TYPES pointing to a set of
Edgerepresenting all edges of the respective type contained in the subgraph.- Return type:
Examples
>>> generator = NetworkGenerator( ... "bolt://localhost:7687", auth=('<user>', '<password>')) >>> metabos = {'FDMO3', 'h2o', 'FDMO2', 'fald', 'FDMO6', 'so3', >>> 'FMNRx', 'nad', 'FMNRx2', 'fmn', 'nadp'} >>> gs = {'1576', '1557', '1559'} >>> orgs = {"Streptomyces tsukubensis", "Bacillus smithii", >>> "Streptomyces fulvissimus"} >>> edges_ = generator.get_reaction_subgraph(orgs, gs, metabos)
- get_subgraph(organisms: Set[str], genes: Set[str], metabolites: Set[str]) Dict[str, Set[Edge]][source]#
Returns a subgraph with all nodes given plus the reaction nodes required to connect them
- Parameters:
- Returns:
All connections between organisms, genes and metabolites contained in the database. organism - metabolite are third order connections (via gene and reaction nodes), all other connections are second order (via reaction nodes)
- Return type:
Examples
>>> generator = NetworkGenerator( ... "bolt://localhost:7687", auth=('<user>', '<password>')) >>> metabos = {'FDMO3', 'h2o', 'FDMO2', 'fald', 'FDMO6', 'so3', >>> 'FMNRx', 'nad', 'FMNRx2', 'fmn', 'nadp'} >>> gs = {'1576', '1557', '1559'} >>> orgs = {"Streptomyces tsukubensis", "Bacillus smithii", >>> "Streptomyces fulvissimus"} >>> edges_ = generator.get_subgraph(orgs, gs, metabos)
- property edges: Dict[str, Set[str]] | Dict[str, Edge]#
Querying all nodes in the database
- Returns:
All edges contained in the database by type.
Edges are either represented as strings of “source.nodeLabel -> target.nodeLabel” or as
Edge(NamedTuplewith attributes source at position 0 and target at position 1)If no edges are found and empty
dictwill be returned.- Return type:
- property gene_ids#
Get all gene in the database by their ID and node label
- property metabolite_ids#
Get all metabolites in the database by their ID and node label
- property neo4j_nodes: Dict[str, Set[Node]]#
- returns: nodeLabels of all nodes by node types, where nodes types are the
keys and the values are sets of
neo4j.graph.Node. If no nodes are found and emptydictwill be returned.
- Return type:
Dict[str, Set[Node]]
- property organism_ids#
Get all organisms in the database by their ID and node label
Functions#
- pymantra.database.reduce_reaction_nodes(graph: DiGraph)[source]#
Clean-up a metabolite-reaction graph
This functions merges reaction which have the same substrates and products (e.g. because they are coming from different organisms) and removes reaction nodes which have only one participant, i.e. transport reactions.
All modifications are done inplace.
This utility function is only recommended if you did not use the pymantra database module to generate your network, as the already applies this function internally.
- Parameters:
graph (nx.DiGraph) – Directed metabolite-reaction graph, optionally with multi-omics reaction connections