It is best to gather as much information as possible before using SimRNAweb to do a simulation. SimRNAweb can do de novo folding and can often solve a short hairpin sequence quite successfully. However, most of the interesting RNA structures are complex. Secondary structure information alone may not be adequate to ensure that a correct prediction is obtained. Therefore, it may be necessary to obtain information on tertiary structure.
In Figures 1a-g, we show various examples of using restraints with sequences and PDB files under various circumstances for PDB id 1L2X.
(1. Sequence alone) When only the sequence is submitted to SimRNAweb, the RNA sequence should be pasted into Box (3):
Figure 1a. Submitting sequence alone to SimRNAweb.
(2. Sequence and secondary structure) When the sequence is submitted along with the structure of a pseudoknot (perhaps obtained from RFAM or from SHAPE data), Boxes (3) and (4) are used. The sequence above is submitted in Box (3), as above
GGCGCGGCACCGUCCGCGGAACAAACGGand the pseudoknot structure is introduced by submitting two lines of secondary structure corresponding to the two overlapping helices in the secondary structure field (Box (4)),
Note that pseudoknot restraints are expressed by including an
additional line with the entangled helix on a second line. This can
be done up to as many lines as needed to specify any very deeply
entangled pseudoknots. This notation allows clear specification for
more than one pseudoknot, as opposed to the usual notation
Figure 1b. Submitting sequence along with secondary structure restraints to SimRNAweb.
(3. Refining a 3D structure) When starting with a structure from a PDB file (e.g., 1l2x_rna_clust0.pdb), one should upload the PDB file in Box (5) of the submission form. In principle, the secondary structure restraints listed in Box (4) are not absolutely required. However, in general, unless one wishes to continue (or restart) a simulation from some intermediate point without restraints, there is little reason to submit a correct PDB structure and do a simulation on it. Maybe there is a situation where this might be necessary, but in general, the purpose would (probably) be refinement.
The secondary structure information is entered in Box (4) as before,
and the input PDB structure file: 1l2x_rna_clust0.pdb is entered in Box (5).
Figure 1c. Submitting a PDB file along with secondary structure restraints to SimRNAweb.
(4. Including all types of restraints with a PDB file) It is also possible to submit a PDB file (Box (5)) with secondary structure restraints (Box (4)),
include the PDB structure file (1l2x_rna_clust0.pdb) for refinement in Box (5), specify frozen residues in the PDB file in Box (6),
and upload any additional distance restraints into Box (7) using a text file with the appropriate syntax (e.g., 1l2x_restraints.txt)
SLOPE A/14/C4' A/7/P 3.0 4.0 1.0 WELL A/14/C4' A/7/P 3.0 4.0 1.0
NOTE: Please see the last Section of this help document to find more information on formatting and combining distance restraints (including some applied examples).
Figure 1d. Submitting a PDB file to SimRNAweb along with secondary structure restraints, distance restraints, and frozen parts of the structure (based on the PDB file).
(5. Including all types of restraints with sequence files) Similar to example (4), when submitting sequence alone with all types of restraints, first paste the RNA sequence
GGCGCGGCACCGUCCGCGGAACAAACGGinto Box (3), paste the secondary structure restraints (with pseudoknot) into Box (4)
and, as with the PDB file, the distance restraints should be submitted using the text file (1l2x_restraints.txt), where, in this example, the restraints are
SLOPE A/14/C4' A/7/P 3.0 4.0 1.0 WELL A/14/C4' A/7/P 3.0 4.0 1.0
in Box (7).
Figure 1e. submitting a sequence to SimRNAweb along with secondary structure restraints and distance restraints.
(6. When several missing residues must be inserted) ADVANCED!
SimRNAweb can be used to add missing sequence in such structures as RNase P (PDB id: 3DHS), Figure 2ab. Here, the PDB file presents a molecule that has incomplete sequence due to unresolved fragments.
The PDB file 3DHS presents a single chain (A) with atomic coordinates for residues in the range 1-81, 250-263, 271-351, and 376-414. The PDB file indicates that the following residues are missing: 231-249, 264-270, 352-375, and 415-417, and the missing residue names are indicated. It also indicates missing P atoms for residues 250, 271, 376.
First, a full length sequence of the molecule to be modeled must be reconstructed. This task can be achieved e.g. with a text editor by combining the sequence extracted from the coordinates, with the missing parts inserted or appended terminally in appropriate places. This sequence should be split into two chains corresponding effectively to residue ranges 1-81 and 231-417. As a result, all residues numbered 231-417, originally in chain A, shall become a new chain (B), and be renumbered 1-187 (original number minus 230), as sequence input requires the residue numbering to restart from 1 for every chain.
Second, the PDB file should be prepared. This task can be achieved e.g. with a macromolecular structure viewer/editor such as PyMOL, UCSF Chimera, Swiss PDB Viewer, Rasmol etc. Residues with missing atoms (250, 271, 376) should be removed from the PDB file, and all remaining residues in the range 231-417 (251-263, 272-351, and 377-414) should be renumbered by subtracting 230 (to number them as 21-33, 42-121, and 147-184, respectively) and their chain identifier should be modified from A to B. Only then, the sequence in the PDB file will correspond in numbering to that in the sequence input file.
The modeling of missing sequences with SimRNAweb should be executed with the residues from the PDB input frozen, and the remaining residues (added from sequence input) free to fold.
Figure 2a. Original PDB structure (PDB id: 3DHS) with the missing sequence. Red residues indicate points where a part of the sequence is missing.
Figure 2b. Example of the result of inserting the missing sequence into the original PDB structure (PDB id: 3DHS).
The input data is written in a single line in a basic text file with the sequence of RNA (both upper and lower cases are acceptable). For example:
There should be nothing but the desired sequence or sequences contained in the file because the program will read everything in the file as though it were part of a sequence. There should be no additional spaces either.
To have more than one RNA chain as input an input, the user must separate the different chains by white spaces. For example:
If the user wants to provide secondary structure restraints, the contents of the restraint file should be in the following one-line format:
Or, when pseudoknots are involved, several lines can be provided depending on the characteristics of the pseudoknot. For example, if there is only one pseudoknot, then the following is sufficient:
If more than one pseudoknot is involved, each one should be written on a separate line. For example, the following structure would involve two pseudoknots:
((((.......))))...............((((.........))))). ......((((.............))))...................... ..................((((..............)))).........Comments:
(((......)))can also be written as follows
(..........) .(........). ..(......)..It is important to emphasize that the data for secondary structure and pseudoknots must be listed on subsequent lines regardless of how the data is specified. Otherwise, SimRNA will assume that there is a second chain, and it is likely that an error will occur.
Constraints can also be applied to multichain problems. For example, in the case of two chains, restraints can be written as follows:
AAAACCCCUUUU AAAACCCCUUUU ((((....(((( ))))....))))
The above example will form a double-stranded RNA (dsRNA) stem with an interior loop containing poly(C). When the sequence is written as above in the restraint file, the first line is ignored.
If more than two chains are involved, this can also be accommodated by the above scheme. For example, constraints on four chains could be written as follows
AAAACCCCUUUU AAAACCCCUUUU AAAAGGGGAAAA AAAAGGGGAAAA ((((....(((( ))))....)))) ............ ............ ............ ....((((.... ....)))).... ............ ....((((.... ............ ............ ....))))....
Figure 3. A possible 3D structure of the four sequence shown above folded under the proposed restraints using SimRNA.
In general, there is a variety of ways that this file can be written. However, clarity is probably the most important thing to remember.
Finally, please note that SimRNAweb doesn’t allow the user to exclude the possibility of base pairing using the secondary structure constraints file. Thus, even the dot “.” does not prohibit the formation of base pairing.
The user requests starting the calculation with a structure provided by the user in PDB format.
Essentially, a PDB file is cleaned (water, ions, non-standard residues are removed) with `rna-pdb-tools` to get a file "SimRNA ready". However, these tools simply follow orders (all muscle and no brain). Therefore, it is wise when the user prepares the PDB files to avoid any inadvertent misinterpretation of desired structural information on the part of the automated tools.
Hence, although a fair amount of flexibility is built into SimRNAweb, there are some important caveats to keep in mind.
In SimRNA, distance restraints can be defined for any pair of atoms used in SimRNA representation, as well as to the central point of the nucleic acid base: for pyrimidines P, C4′, N (N1, N9), C2, C4 and for purines P, C4′, N9, C2, C4 and C6 (for purines) and the the middle of the base (label MB).
The restraints come in two basic forms:
SLOPE. A typical
effectively "captures" the two beads when they come within a
prescribed distance from each other as in Figure 4a, where the
penalty is zero when the distance between the orange and red bead is
outside of the well, and some negative number when inside the
well. This means nothing happens until it finds the hole. A
SLOPE restraint penalizes any region outside
the zone where the penalty is zero as in Figure 4b.
Figure 4. A cartoon describing the way restraints affect the
interaction between two beads: here orange is the reference bead and
the red bead is shown moving relative to the orange bead. a)
WELL restraint where outside the hole, the red bead
moves freely and inside the hole, it is trapped to a specific range.
SLOPE restraint where the penalty is zero when the
distance between the beads is inside the zone, and outside this, the
beads are either pulled together or pushed away.
WELL restraint is expressed by a single line in the restraints
file as follows:
WELL atom_1_id atom_2_id min_dist max_dist weight
SLOPE restraint is expressed similarly as follows:
SLOPE atom_1_id atom_2_id min_dist max_dist weight
or (alternative for
DISTANCE atom_1_id atom_2_id min_dist max_dist weight
Example line in a restraints file:
SLOPE A/23/C4' C/45/P 5.5 8.5 1.0 WELL A/23/C4' C/45/P 6.5 7.5 1.0
A/23/C4' means atom C4' of nucleotide 23 in chain A
C/45/P means atom P in nucleotide 45 in chain C
5.5 [Å]: minimal distance where the weight is zero (for
6.5 [Å]: minimal distance where the weight is -1 (for
7.5 [Å]: maximal distance where the weight is -1 (for
8.5 [Å]: maximal distance where the weight is zero (for
1.0 weight of this restraint weight for both
WELL, the value is -1
between 6.5 and 7.5 [Å], for SLOPE the value increases for distances
greater than 8.5 or less than 5.5 [Å].
This is an example of a multifunctional restraint that resembles Figure 5e.
Figure 5. Examples of types of constraints. a) An elementary
function of type
SLOPE that depends on the following
three parameters: the minimum distance, the maximum distance and the
slope. b) An elementary function of type
depends on three parameters: the minimum and maximum distance and
the depth of the well. c) A single
SLOPE function with
negative weight. d) A single
WELL function with
negative weight. e) A combination one
WELL function. f) A combination of
SLOPE and two
WELL functions, both
WELL function and one
with negative weights.
A restraint can be thought of as a flexible tether that drives the selected atoms towards a certain distance by applying a penalty for distances that deviate from that range. It can also provide a reward when a desired distance is achieved. The penalty and reward are positive and negative contributions to the total energy of the simulated system, respectively.
There are two types of distance restraints:
SLOPE (the keyword
DISTANCE can also
be used in the place of
SLOPE). Both restraints
describe an inner zone between the specified minimum number and
maximum number (which can be the same
SLOPE, the penalty within the inner zone
is zero, and all regions outside change linearly with distance by
the weight outside this zone. When the weight is positive,
SLOPE forms a "V" shape or a "\_/" shape depending
on the size of the inner zone (Figure 5a). If the weight is
negative, the shape will be inverted, but the penalty in the inner
zone is always zero (for
SLOPE). In the case
WELL, a positive weight produces a divot "|_|" where
outside the zone, the value is zero and inside the well, the energy
is negative (Figure 5a). Changing the sign of the weight
WELL means we create a wall or a stumbling block (a
positive penalty) in the inner zone.
The basic forms for
the weight is positive, are shown in Figure 5a and b.
In the case of a
SLOPE-type restraint, the two atoms
are tethered towards the region by applying a linear penalty that
corresponds to the degree of violation of the distance from the
desired region. When the distance between the atoms becomes equal to
the desired value, the value of the function reaches zero. The shape
of the function resembles a
V, with a bottom that can
correspond to a single point or to a “flat” region (Figure
In case of a
WELL-type restraint, the function is flat
and equals zero for any distance outside the desired range, while
the distances within the desired range correspond to the negative
value of the weight (Figure 5b).
functions can be “reversed” when a negative weight is applied
(c.f., Figures 5c and d). For example, applying
SLOPE-type function with a negative weight can be
used as a repelling function (Figure 5c). This function can
be useful in simulations in which the user desires to study molecule
stretching between terminal residues, for example. On the other
WELL-type function can be applied with a
negative weight when the user wishes to define a distance range that
the atoms should avoid. When atoms are in the distance range
WELL, an additional penalty is applied
Any number of these two types of functions can be also combined in order to define complex restraints. The resulting function can adopt various shapes (Figures 5e and f). Therefore, the relative distance of two atoms under consideration can be described by a dedicated function or a linear combination of functions used as a part of the total scoring of the energy.
For both types of restraints, the user must specify the restraints in subsequent lines of the restraint file in the following format
where examples are provided in the next
ChY specify the
chain index (chain name can be the same or
j refer to the index of
the particular residue on the respective chain,
atom_j refer to the atom on
which the constraint is applied. Restraints of
WELL are 0 except for the range
max_dist. For the
region between min/max
max_dist), the value
Restraints of type
SLOPE are 0 within the
that range, a linear increasing positive penalty is added (the pair
of atoms are attracted to each other because there is less penalty
as they approach the region between
max_dist). The value for the penalty is
Restraints for a given pair of atoms can be combined (added). It requires two (or more) lines to specify subsequent contributions.
For an applied example of using the above described restraints,
Figure 6 shows two examples where the secondary structure
restraints and distance restraints (
SLOPE) were used to improve the fit of the 3D
structure for two tRNA structures. The necessary restraint files and
sequence for folding 3L0U are also provided as an example on the
submission page of SimRNAweb, where the best structure using
this combination of restraints was 6.1 Angstroms RMSD (Figure
6a). Figure 6b provides the requisite files for PDB
structure 1EVV, where SimRNAweb produced one cluster with 6.5
Figure 6a. Results of folding the sequence for PDB id 3L0U with secondary structure restraints and distance restraints using
Figure 6b. Results of folding the sequence for PDB id 1EVV with secondary structure restraints and distance restraints using
Job title – letters, digits, and
+_-.(), 20 first characters of a title will make a job id so pick something short, sweet, and to the point.
E-mail address – the email address will be used to send results, but is not required of the user. However, for long sequences, the simulation can take many hours, so if the page is lost, there is no way to find it again.
RNA sequence – see above
Secondary structure – see above
This is a more advanced demo on how SimRANweb can be used to give you a feel of the usability of the server.
CGCGCAAGCG) using a tetraloop PDB motif taken from the PDB database, first we have to prepare the PDB file.
XXXX <- use the tetraloop from 1zih to model your RNA 1234567890 # numbering of your RNA sequence CGCGCAAGCG # your RNA sequence (((....))) # secondary structure of your RNA sequenceStart PyMOL (https://www.pymol.org/) and fetch a structure 1zih (
Remember, it is absolutely essential to examine the PDB file (e.g. with PyMOL), check for missing parts, gaps (or unusual numbering), ions, water, or anything else that looks suspicious.
Note that this PDB structure has hydrogen atoms, please remove them (
Action -> hydrogen -> remove). Select the tetraloop ('GCAA') and extract it to a new object and rename this object to
set_name 1zih, tetraloop. Since the number of the tetraloop is slightly different than in the RNA sequence, the numbering must be shifted by 1 (the 5th residue becomes the 4th etc) using
alter (sele), resv-=1. Note that this PDB file has 10 states, we need only one to use with SimRNAweb. The easiest way to get rid of additional states is to save the molecule to a new PDB file. Now you can use this file to model your sequence.
seq: CGCGCAAGCG secondary structure: (((....))) freeze: A:4-7upload the file
tetraloop.pdband click Submit!