Modomics - A Database of RNA Modifications

Help Page

Overview of the MODOMICS Database

MODOMICS is the first comprehensive database system for biology of RNA modification. It integrates information about the chemical structure of modified nucleosides, their localization in RNA sequences, pathways of their biosynthesis and enzymes that carry out the respective reactions (together with their protein cofactors). Also included are the protein sequences, the structure data (if available), selected references from scientific literature, and links to other databases allowing to obtain comprehensive information about individual modified residues and proteins involved in their biosynthesis.

The MODOMICS database contains the following types of items:

Modifications:
A collection of naturally occurring modified RNA nucleosides. For each modification the name, short name and one-letter abbreviation are provided.

Modifications can be browsed according to the residue they originate from and by a chemical type of reaction. The originating choices include:
- the four standard nucleoside residues present in the transcribed RNA (A, G, C, or U)
- queuosine, which is synthesized first as a free modified base, and only then attached to the ribose of RNA by a transglycosylation reaction and eventually further hypermodified
- the 5’ end of the nascent transcript, which represents the starting point for RNA capping.
Detailed information for each modification includes:
- Full and short (acronym) name
- RNA or HTML conventional abbreviation
- Kingdom in which it occurs (based on the list of organisms known to possess enzymes synthesizing it)
- RNA types in which it occurs (based on MODOMICS collection of modified sequences)
- 2D and 3D structures
- Sum formula
- Monoisotopic, average masses and [M+H]⁺ (protonated) mass
- Product ions: Protonated fragment ions generated from the precursor mass [M+H]⁺
- Normalized LC elution time and LC elution order/characteristics ^*
- SMILE code
- Modifying chemical groups, also indicated in red on the 2D nucleoside representation
- PDB ids for RNA structures that contain the given modification (if available)
Each modification is linked to the pathways section, to reactions in which it is a substrate or product, and the list of enzymes identified so far that catalyze its formation in various organisms.

^*HPLC retention times (reversed phase chromatography (C18) with acetonitrile/ammonium acetate as mobile phase) are normalized to guanosine to account for different LC systems, gradients, column sizes, flow rates etc. For the elution order, cytosine, uridine, guanosine, adenosine and the late eluting m6A were chosen as references. Absolute retention times in the referenced chromatogram are: C: 4.6 min, U: 6.3 min, G: 10.9 min, A: 15.6 min, m6A: 21.5 min. The product ions show the typical neutral loss(es) for the respective nucleoside.
Pathways:
Here, we present six pathway graphs showing how modifications emerge from the different unmodified residues in precursor RNA (as defined above for the modified nucleoside residues). Placing the mouse cursor over a modification’s short name allows for displaying its chemical structure. Arrows connecting two modifications are colored according to the chemical type of a reaction. Dashed arrows indicate putative reactions. All arrows are clickable and linked to reaction-dedicated web-pages. Via these reaction pages the users can access information about specific modifications and enzymes from the corresponding sections of Modomics.

The graphs are interactive. It is possible to zoom and move the whole graph as well as to change the graph layout. Graphs may be downloaded as pictures, pdf or xml files.
Reactions:
A list of experimentally validated and predicted modification reactions. It can be filtered according to the originating base and the chemical type of reaction (see above under Modifications). Detailed information on each reaction comprises enzymes that have been experimentally proven to catalyze it, chemical structures of substrate(s) and product(s), information about cofactors, and other information in the free text format. Putative enzymes are not indicated. Reactions or pathways existing in a particular organism can be also accessed from the Protein entries in Modomics.
RNA sequences:
A collection of tRNA, rRNA, snRNA and snoRNA sequences that are known to be modified at multiple positions. For families of homologous RNAs multiple sequence alignments adapted from RFAM, Comparative RNA Website, and Transfer RNA databases, are available. Modifications within sequences are indicated in blue and marked with one-letter abbreviations (for their meaning the users have to consult the column RNA Mods abbrev. under Modifications entry). Upon clicking a given modified base within a sequence, the corresponding page in Modified nucleotides of Modomics is shown, and from there the modification pathway is made accessible. A gene encoding an enzyme responsible for the formation of the particular selected modified nucleotide may be unknown; the list of enzymes given in the table contains all known enzymes catalyzing the modification in RNA but not necessarily the one pointed in the RNA sequence.

Uppercase and lowercase letters in rRNA sequences indicate regions of grater and lesser confidence in alignment, respectively (according to the Comparative RNA Website database).

The sequences can be downloaded in text format (“Download as ASCII” option) or displayed in Jalview applet.

The “Draw modification profile” (available for rRNA and tRNA sequences) allows to display mapping of the modified positions on secondary structure diagrams of RNA molecules. The mapping is done based on the sequence alignments. For rRNAs a reference structure of E. coli SSU and LSU rRNAs is used. While for tRNAs a consensus secondary structure diagram is used. It is possible to map onto the diagram information from a user-selected set of sequences available in MODOMICS. In such a case, the percentage of modified ribonucleosides of any type in each alignment position is calculated and displayed. The resulting diagrams can be downloaded as image files.
Proteins:
The collection of proteins involved in RNA modification processes. Contains both functional enzymes and protein-co-factors necessary for multi-protein enzymatic activities. The Proteins table can by filtered by species and enzyme type (methyltransferase, pseudouridine synthase etc…) or by organism where a given protein has been identified. The users can choose which table columns are displayed. The choices include:
- Traditional Name (most often used or recommended acronym)
- Full name
- Synonym (often an acronym used earlier)
- GI number
- ORF name
- COG
- UniProt ID
- Structures (PDB ID)
- Position/Modification type (refers to the final modification found in the given position; in rRNA sequences the source organism numbering is given first, E. coli numbering is given in brackets)
- Complex (if the protein works as a part of the a well characterized complex)
- Enzyme type
- Organism
At individual protein level the following detailed information are given:
- Short text comments characterizing the protein
- The amino acid sequence (in order to facilitate bioinformatics analysis of given proteins there is a utility that sends the sequence from a MODOMICS entry to BLAST on the NCBI webserver)
- The list of catalyzed reactions (for experimentally characterized enzymes)
- A list of a few relevant corresponding publications (a wider list of publications retrieved automatically by a keyword search is available via the PubMed link, see below)
- Links to additional data sources e.g. Wikipedia, relevant PubMed search, Saccharomyces Genome Database (for yeast proteins), EcoCyc (for E. coli proteins)
Guide RNA:
A census of human and yeast snoRNAs, involved in RNA-guided RNA modification by the C/D box and H/ACA box ribonucleoproteins, linked to the corresponding modification sites in human and yeast RNAs. The list of Guide RNAs can be browsed by organism and/or type of modification that is found in the target position Information included for each snoRNA, if available:
- Name (linked to appropriate entries in HGNC database or The yeast snoRNA database for human and yeast snoRNAs, respectively)
- ORF/Alternative name
- Modification type
- Target RNA type
- Modified position
- Complex (H/ACA box or C/D box snoRNP)
- Organism

Publications

Building blocks:
A catalogue of “building blocks” for the chemical synthesis of naturally occurring modified nucleosides. The collected data includes chemical structures of precursors used for the synthesis and relevant literature references. Each building block is characterized by the IUPAC name and CAS number. A list of relevant publications is also presented.
Search:
There are three possibilities for searching desired information in Modomics:
- a keyword search
- a BLASTP search of protein sequences collected in MODOMICS
- a BLASTN search of nucleic acid sequences collected in MODOMICS
- a SMILES search of modified residues molecules based on their chemical substructure and substructure
Hits and query-hit alignments from the results of the search done on MODOMICS protein or nucleic acid sequences collections can be downloaded in fasta format.

Links: A collection of links to thematically related resources.