|
RNA secondary structures used for benchmarking RNA 2D prediction methods + predictions generated by methods tested by CompaRNA
The dataset of RNA structures extracted by CompaRNA from the PDB database
consists of previously unknown RNAs. In order to remove redundant RNA sequences
cd-hit-est
was used. The filtering was performed by comparing all aligned sequence pairs
using a 90% sequence identity cutoff and assuming that minimal alignment
coverage for the longer sequence cannot exceed 70%. RNAstrand stores experimentally solved RNA secondary structures, but not necessarily extracted from 3D structures. The entire RNAstrand dataset containing 4666 RNA sequences and secondary structures was downloaded. The procedure for filtering this dataset was exactly the same as in the case of the PDB dataset. The only difference is that no reference 3D structures were used, therefore only one base pair definition was used. The final RNAstrand dataset consists of 1987 RNAs. | |||||||||||||||||||
|