NPdia Help & Documentation

Reference guide for understanding and using the database

Overview

NPdia is a manually curated database of Type I PKS (T1PKS) and NRPS biosynthetic pathways from actinomycetes. Each entry represents a complete biosynthetic pathway reconstructed from published literature, with every intermediate captured as a SMILES string.


Column Descriptions

MIBiG ID

The biosynthetic gene cluster (BGC) identifier from the MIBiG repository. Clicking the ID links directly to the corresponding MIBiG entry.

Compound

The name of the natural product(s) produced by the BGC, along with the producing organism.

Biosynthetic Class

The enzyme class responsible for biosynthesis: T1PKS, NRPS, or Hybrid (PKS-NRPS).

Steps

The total number of biosynthetic steps curated for the pathway, from starter unit loading to the final scaffold product.


Pathway Table Columns

Order

The sequential step number within the biosynthetic pathway. Steps are numbered starting from 1.

Enzyme

The gene or protein responsible for catalysing the reaction at this step, as annotated in the corresponding MIBiG GenBank file.

Module

The module number within the assembly line.

Nonlinearity

Describes any deviation from standard linear module activity. This field is left blank for standard elongation steps. Possible annotations include:

AnnotationDescription
Inactive: [domain]The specified domain (e.g. KR, DH, ER) is present in the gene sequence but non-functional at this step
Missing: [domain]The specified domain is absent from the gene sequence but its chemical transformation is observed in the product, suggesting activity in trans or by an uncharacterised enzyme
transAT: [gene, substrate]The AT domain is provided in trans by a separate enzyme rather than being part of the module itself
Iteration: [domains, substrate]The module is used iteratively; the domains and substrate used in each iteration are listed
ModuleSkipThis module is skipped in the biosynthesis of this particular product

Substrate

The compound(s) required to produce the intermediate shown in the Product column of the same row.

Product (SMILES)

The SMILES string representing the biosynthetic intermediate produced at this step. See the SMILES Notation section below for details.

Product_ID

A unique identifier for the intermediate produced at this step, in the format [BGC_number]-[step_number] (e.g. 55-3). These IDs are used in the Substrate column of subsequent steps to indicate which intermediate is carried forward, enabling tracing of the full biosynthetic trajectory.


SMILES Notation

All SMILES strings in NPdia were generated by manually tracing each biosynthetic step from the primary literature and drawing the corresponding intermediate structure in ChemDraw, followed by conversion to SMILES format.

Key conventions


Downloading Data

The full NPdia dataset is available for download from the Download page in Excel format (.xlsx).

The downloaded file contains all pathway entries with the following columns: MIBiG ID, Compound, Class, Order, Enzyme, Module, Nonlinearity, Substrate, Product (SMILES), Product_ID, and associated metadata.

Potential use cases


Contact

For questions, error reports, or suggestions for new entries, please contact:

[Contact email or GitHub Issues link — to be filled in]