Home
Method overview
Submit sequence
FAQ
Paper about "MetaDisorder"
Gallery of IUPs
List of protein disorder predictors


Overview

GeneSilico MetaDisorder web service provides access to four metapredictors designed to predict disorder in proteins using existing bioinformatics tools and machine learning. As an output you will get metapredictions based on optimized results from primary methods. Methods were carefully benchmarked and validated using our own dataset and during blind tests CASP8 and CASP9.

MetaDisorder

This method uses 15 predictions from primary disorder predictors (for a complete list see table below) and than weights their output according method accuracy (see Figure 1). Individual weights of methods correspond to 1+Sw score of the method for the first dataset. Although this method seems to be very simple, it has been proven during CASP (the best method in 2008) that this approach can be very succesful (AUC for targets in CASP8 was over 0.91!!!). On the other hand, those targets were very easy to predict (AUC for our own, more challenging dataset was ~0.83, which seems to be more realistic).

Metadisorder3D

Uses alignments from 8 fold recognition method (for a complete list see table below). The idea behind it is that the gaps in alignments can be used as an indicator of disorder (but please compare easy case in Example 1 with more blurred one in Example2). Resulting predictor was able to detect disorder with AUC 0.83 compared with 0.91 for CASP8 targets and MetaDisorder). It must be stressed that each method can have different utility for disorder prediction and additionally alignments produced by FR methods are scored according their accuracy (this mean that the information from "good" alignments should be scored more than from worse). To address those questions we optimized the weights for given method taking into account alignment quality using genetic algorithm implemented in PyEvolve. CASP8 target were used for training and benchmarking. Additionally, the method uses smoothing filter (to avoid one or two letter long miss predictions) and the correction for terminal amino acids (prone to be more disordered).
MetaDisorder3D takes at most 10 best alignments produced by FR method and divide them to "good", "medium" and "poor" alignments using methods native scores:

MetadisorderMD

Merges above mentioned methods into one using genetic algorithm. Trained on CASP8 targets, validated during CASP9

MetadisorderMD2

Similar to previous method, but for the optimization, instead of using Sw score in fitness function of GA, we use so-called Sww score, which tries to capture the best features of AUC and Sw score. Sww score is calculated exactly as classical Sw, but the disorder/order thresholds are moved like it is done in AUC.

Read more:


poster of MetaDisorder by Kozlowski PDF version

Dataset used to train and benchmark methods

We used 3 separate datasets which were cross validated and/or bootstrapped when used.
1) 1147 proteins from PDB database filtered by resolution <2 Å, R-factor < 0.2, length 50-1000 amino acids and sequence identity < 20%
2) 566 proteins composed from Disprot (version 3.6) and CASP7 targets
3) 122 targets from CASP8

How disorder is defined and calculated?

Experimental  -----------------------DDD-DDDDDDDDDDDDDDD------D--
Prediction    DDDD----------------------------------DDDDDDD---DDD

    TP - both disorder                             DDD
    FN - predicted as ordered, but is disordered   D-D
    TN - both ordered                              ---
    FP - predicted as disordered, but is ordered   -D-


The accuracy of methods was calculated using Sw score and AUC (area under curve):










Method Sw ROC
poodle-l 0.453 0.792
ipda 0.439 0.779
iupred-l 0.417 0.759
dispro 0.415 0.761
poodle-s 0.414 0.763
iupred-s 0.412 0.757
Spritz-l 0.399 -
prdos 0.391 0.745
ronn 0.380 -
DISOPRED2 0.367 -
DisEMBL 0.328 -
Spritz-s 0.312 -



MetaDisorder (binary) 0.464 0.801
MetaDisorder (continous) 0.469 0.821


Primary methods for disorder prediction

MethodReferenceInstallation
DisEMBLLinding R, Jensen L, Diella F, Bork P, Gibson T, Russell R. Protein disorder prediction: implications for structural proteomics. Structure 2003; 11: 1453-9. local
DISPROT (VSL2)Obradovic Z, Peng K, Vucetic S, Radivojac P, Dunker AK. Exploiting heterogeneous sequence properties improves prediction of protein disorder. Proteins 2005; 61: 176-82. local
iPDASu CT, Chen CY, Hsu CM. iPDA: integrated protein disorder analyzer. Nucleic Acids Res. 2007 Jul;35(Web Server issue):W465-72. remote (no longer available)*
DISpro Cheng J, Sweredoski M, Baldi P. "Accurate Prediction of Protein Disordered Regions by Mining Protein Structure Data" Data Mining and Knowledge Discovery. 2005; 11(3):213-222. local
GlobPlotLinding R, Russell R, Neduva V, Gibson T. GlobPlot: Exploring protein sequences for globularity and disorder. Nucleic Acids Res. 2003; 31: 3701-8. local
IUPredDosztanyi Z, Csizmok V, Tompa P, Simon I. IUPred: web server for the prediction of intrinsically unstructured regions of proteins based on estimated energy content. Bioinformatics 2005; 21: 3433-3434. remote
PDISORDER Softberry, Inc. remote
POODLE-SShimizu K, Hirose S, Noguchi T. POODLE-S: web application for predicting protein disorder by using physicochemical features and reduced amino acid set of a position-specific scoring matrix. Bioinformatics 2007; 23: 2337-8. remote
POODLE-LHirose S, Shimizu K, Kanai S, Kuroda Y, Noguchi T. POODLE-L: a two-level SVM prediction system for reliably predicting long disordered regions. Bioinformatics 2007; 23: 2046-53. remote
PrDOSIshida T, Kinoshita K. PrDOS: prediction of disordered protein regions from amino acid sequence. Nucleic Acids Res. 2007; 35: W460-4. remote
Spritz Vullo A, Bortolami O, Pollastri G, Tosatto S. "Spritz: a server for the prediction of intrinsically disordered regions in protein sequences using kernel machines" Nucleic Acids Res. 2006; 34(Webserver Issue): W164-W168 remote
RONNYang Z, Thomson R, McNeil P, Esnouf R. RONN: the bio-basis function neural network technique applied to the detection of natively disordered regions in proteins. Bioinformatics 2005; 21: 3369-76. local
* - from November 2015, iPDA method has been replaced by SPINE-D in pipeline

Fold recognition methods for disorder prediction

MethodReferenceInstallation
PSI-BLAST* Altschul SF, Madden TL, Schaffer AA, Zhang J, Zhang Z, Miller W, Lipman DJ. Gapped BLAST and PSI-BLAST: a new generation of protein database search programs. Nucleic Acids Res. 1997; 25(17): 3389-402. local
HHsearch* Soding J. (2005) Protein homology detection by HMM-HMM comparison. Bioinformatics 21, 951-960. local
FFAS Jaroszewski L, Rychlewski L, Li Z, Li W, Godzik A. FFAS03: a server for profile--profile sequence alignments. Nucleic Acids Res. 2005; 33, W284-8. local
MGenThreader Lobley A, Sadowski MI, Jones DT. pGenTHREADER and pDomTHREADER: new methods for improved protein fold recognition and superfamily discrimination. Bioinformatics 2009; 25(14): 1761-7 local
PCONS5 Wallner B, Elofsson A. Pcons5: combining consensus, structural evaluation and fold recognition scores. Bioinformatics. 2005; 21(23): 4248-54. local
PHYRE Kelley LA, Sternberg MJ. Protein structure prediction on the Web: a case study using the Phyre server. Nat Protoc. 2009; 4(3): 363-71. remote (800 aa limit)
* for blast and hhsearch you should expect two results. First, blast is run with and without -F option (called here pdbblast and blastp). Second, hhsearch is run over pdb70 and cdd databases.

This site is powered by:

python_logo mod_pythonHighcharts JS         css3

Valid HTML 4.01!