Name Mapping#

When using mantra’s neo4j database to generate a metabolic network matching experimental data, you will need to convert feature names or their database IDs to internal IDs.

Microbial Organisms#

Microbes are simply accessed by their species name, e.g. “Bacteroides uniformis”. Importantly, make sure names start with a capital letter and that separation is happening through whitespaces and not underscores or similar.

Metabolites#

Recommended database IDs are KEGG and HMDB, followed by Reactome and Virtual Metabolic Human. Some other database like ChEBI or NCBI are supported, but are likely to give less matches.

For mapping metabolite names and database IDs, we provide some functions. Their usage is shown in the following code snippets.

We start by loading the required packages/functions and load a pre-defined set of metabolites, specified by their “common” name.

import json
import pathlib

from pymantra.namemapping import metaboanalyst_name_mapping, NameMapper


# just a list of metabolite names, you can find the file here:
# TODO: add github file link
metabolites = json.load(
    open(pathlib.Path(__file__).parent.absolute() / "metabolites.json", "r"))
print(metabolites)

Next, we query the database IDs for these common names. For this, mantra uses the Metaboanalyst API. In case you already have database IDs for your metabolites, you can skip this step.

# getting database IDs from metabolite names
# can be skipped, if at least one ID type per metabolite is known
# this might take a few second to run
name_map = metaboanalyst_name_mapping(metabolites)

Lastly, we use an internal database to convert from (in this case) HMDB IDs to mantra IDs. For all available database sources options please see the documentation of NameMapper.

# we use HMDB IDs to map to mantra IDs
mapper = NameMapper()
mantra_ids = {
    hmdb_id: mapper.map_id(hmdb_id, "hmdb", "internal")
    for hmdb_id in name_map["HMDB"]
}
print(mantra_ids)

Full Example Code#

import json
import pathlib

from pymantra.namemapping import metaboanalyst_name_mapping, NameMapper


# just a list of metabolite names, you can find the file here:
# TODO: add github file link
metabolites = json.load(
    open(pathlib.Path(__file__).parent.absolute() / "metabolites.json", "r"))
print(metabolites)

# getting database IDs from metabolite names
# can be skipped, if at least one ID type per metabolite is known
# this might take a few second to run
name_map = metaboanalyst_name_mapping(metabolites)

# we use HMDB IDs to map to mantra IDs
mapper = NameMapper()
mantra_ids = {
    hmdb_id: mapper.map_id(hmdb_id, "hmdb", "internal")
    for hmdb_id in name_map["HMDB"]
}
print(mantra_ids)