Home
Proteins
Search
Download
Archive
Contact us
menu

Welcome to SpliProt3D!

SpliProt3D is an online database of representative experimentally determined and computationally predicted structures of proteins of the human spliceosome.

Table of contents

Introduction

The spliceosome is a nucleoprotein complex that excises introns (non-coding sequences) from eukaryotic pre-mRNAs. The human spliceosome contains 45 proteins that form small nuclear ribonucleoprotein (snRNP) complexes, as well as up to 200 non-snRNP proteins, of which some are essential for splicing, and others connect splicing to other cellular processes.

The structure of the entire spliceosome has not yet been determined at atomic resolution. However, multiple high-resolution structures of splicing proteins have been solved and a considerable amount of information about the structure of spliceosomal components has been published.

We carried out a systematic, comprehensive structural bioinformatics analysis of the human spliceosomal proteome. We predicted ordered and disordered regions, established a non-redundant set of experimental models of spliceosomal proteins, as well as constructed theoretical models for regions predicted to be ordered, but lacking an experimentally determined structure.

In this database, each protein is considered on its own, i.e. we do not (yet) provide any information about complexes, interactions, and possible conformational changes connected with different functional stages. This is an area of active research, and we hope that the next edition of the database will contain additional layers of information.

There are currently 252 proteins in the database, 43 X-ray models, 61 NMR models, 254 comparative models, 43 de novo models and >500 pro forma constructs (see below).

Entries in SpliProt3D have been linked to the Spliceosome Database http://spliceosomedb.ucsc.edu recently developed by the Jurica lab.

Please do not hesitate to contact us if you have any questions, suggestions, or any new data we could include in the database. In particular, we will gladly update the database with experimentally determined structures to replace current theoretical models. Obsolete models are stored in the Archive.

Features of proteins listed in the SpliProt3D database

For each protein in the SpliProt3D database, the following features are listed:

  • subset - proteins in the SpliProt3D database are classified into subsets based on the experimental work by Agafonov et al., aided by information from the following references: Behzadnia et al., Bessonov et al., Deckert et al., Jurica et al.;
  • the protein's name and gene name;
  • the protein's GenPept identifier in the NCBI Protein database;
  • list of regions in the protein - proteins are divided into regions based on domain mapping, disorder prediction, and other structural analyses. Protein region names in the table are hyperlinked to their individual sub-pages (see below).

A button that allows the display of alignment of homologos of the protein is displayed below. A structural model for the entire protein chain is displayed at the bottom, together with a link which lets the user download the model.

Features of protein regions in the SpliProt3D database

For each protein region in the SpliProt3D database, the following features are listed:

  • start & end - starting and ending position of the region in the sequence of the relevant protein;
  • SCOP & PFAM domain - if any are assigned;
  • (intrinsic) order/disorder status;
  • order/disorder type - this can be either "confirmed domain", "suspected domain" or one of the various types of disorder:
    • alpha, beta, alpha-beta, alpha/coiled-coil - intrinsic disorder containing a high amount of predicted preformed secondary structured elements;
    • RS-like disorder - disorder compositionally biased towards arginines and serines, can be phosphorylated on the serines;
    • polyproline-rich / polyglutamine-rich disorder - disorder rich in poly(P) or poly(Q) peptides which can form a type of structure called polyproline(glutamine) helices;
    • G-rich - disorder rich in glycines, for example the RGG triplets of hnRNP proteins;
    • noncharged - other disorder compositionally biased mostly towards noncharged residues;
    • charged - other disorder compositionally biased to a large extent towards charged residues;
  • compositional bias - defined only for intrinsically disordered regions with a clear compositional bias, this shows the residues that are frequent (biased towards) in a given region;
  • model properties - properties of structural model of the region in the SpliProt3D database. These listed properties of models differ depending on the type of the model (see below).

The structural model for the region is displayed at the bottom of the entry, together with a link which lets the user download the model.

Features of models in the SpliProt3D database

Structural models found in the SpliProt3D database can be either experimental (X-ray or NMR), theoretical (comparative or de novo) or pro forma constructs wherein only the primary and secondary structure of the peptide chain are represented reliably, while the tertiary structure is arbitrary.

For models of protein regions, the features of these models are listed in the entry for the relevant region. Models of protein chains are usually composites of different types of models, and so, no features of these models are provided.

For the model of a protein region, the following features are listed:

  • model type (X-ray, NMR, comparative, de novo or pro forma construct). Some (experimental) models have been isolated from (nucleo)protein complexes containing other molecules. If so, this is also indicated;
  • PDB link of the original model (for X-ray or NMR models) or PDB link of the template structure for a model and the template chain (for comparative models). For X-ray models, resolution is also provided;
  • for experimental and theoretical models (but not constructs or composite models), model quality assessment scores provided by MetaMQAPII; (Pawlowski et al.) and QMEAN (Benkert et al.) are indicated:
    • Predicted deviation of the model from the true structure provided by MetaMQAPII, expressed by (predicted) RMSD and GDT_TS values. In the case of theoretical models the true structure is unknown. In case of experimental models, they may be considered instantiation of "true" structures, so here any deviations from ideal scores represent the imperfections of the scoring function or the inaccuracy of the experimental structure (usually mostly the former and a little bit of the latter).
    • QMEAN contributing scores, the QMEAN total score and the QMEAN Z-score. Explanation of QMEAN scores: QMEAN server help.

If the region is represented by a pro forma construct, as most disordered regions and some ordered interdomain linkers would be, no scores are given, as constructs are to be treated as inherently unreliable. Models of entire protein chains also are not scored.

Model visualization

All models displayed on this website via the Jmol applet are colored by default according to the secondary structure.

All models apart from constructs and full-protein models also contain indicators of accuracy of individual residues generated by MetaMQAPII and supplied as B-factor values. The user can visualize them in the Jmol plugin using the Color>Structures>Cartoon>By scheme>Temperature (relative/fixed) command, or after download with an equivalent command in another molecular visualization software. Coloring the models according to the predicted accuracy in the B-factor field should show reliably modeled residues in deep blue, and indicate the predicted inaccuracy by a spectrum of colors from green (probably slightly inaccurate), to yellow, to red (probably very inaccurate).

Alignment visualization

Sequence alignments are annotated and visualized in a Jalview applet.
At the top of a sequence alignment is the alignment itself. All proteins apart from the human protein are labeled with their GenPept identifier in the NCBI Protein database and a six-letter abbreviation of their species name. The human protein is also labeled with the name under which it is called in the SpliProt3D database. The alignment is by default not colored, but the Jalview applet is functional and allows coloring the alignment in many ways. By default, features of the human sequence (e.g. domains, suspected domains, disordered regions with a known function) are indicated on the alignment. This can be switched off with the View>Sequence Features command. Jalview documentation is available via the Help command in the applet.
Underneath the alignment are the alignment annotations. By default, the following annotations are visible:
  • intrinsic disorder - predictions of intrinsic order vs. disorder from the GeneSilico MetaDisorder server; 'D' = residue predicted to be disordered;
  • secondary structure - predictions of secondary structure from PSIPRED (the link goes to the PSIPRED server, we used the standalone version); ovals = predicted alpha-helices; arrows = predicted beta-strands;
  • binding disorder – predictions of protein-binding disorder from the ANCHOR server; 'B' = residue predicted to be binding and disordered;
  • solvent accessibility – predictions of residues accessible to solvent from SABLE (the link goes to the SABLE server, we used the standalone version); 'B' = residue predicted to be buried (inaccessible to solvent);
  • coiled coils – predictions of coiled coils from the EMBOSS program PEPCOIL (the link goes to the EMBOSS web interface); 'C' = residue predicted to be in a coiled-coil;
  • posttranslational modifications – sites of posttranslational modifications from UniProt: 'A' = acetylation, 'P' = phosphorylation, 'M' = methylation (any type); 'C' = probable cysteine methyl ester site;
  • alignment conservation, quality and consensus – supplied by Jalview.

Downloading models

All models are available for download in the "Download" section. Individual models can also be downloaded from the relevant protein and protein region pages.

SpliProt3D can be searched according to various criteria. You can search for
  • proteins. You can limit the search to select protein groups and/or search with a particular keyword;
  • protein regions. You can limit the search to select protein groups and/or search with a particular keyword. In this case, you will obtain both a list of regions and a list of proteins where these regions are found;
  • protein structures and/or models, supplying the structure/model type and/or various parameters of the structure/model (X-ray resolution, predicted RMSD, GDT_TS and QMEAN score). In this case, you will obtain a list of regions with a structure/model that fulfils the supplied parameters.
If you search for proteins, you will be able to download only the model for the entire protein. If you want to obtain all the models for a given protein, please check the "Seach for: region" box.

How to cite SpliProt3D database?

Please cite SpliProt3D database:

Korneta I, Magnus M, Bujnicki JM
Structural bioinformatics of the human spliceosomal proteome.
Nucleic Acids Res. 2012;40:7046-7065.

http://nar.oxfordjournals.org/content/40/15/7046.abstract

Literature

Adamczak R, Porollo A, Meller J (2005) Combining Prediction of Secondary Structure and Solvent Accessibility in Proteins Proteins: Structure, Function and Bioinformatics, 59:467-75.

Agafonov DE, Deckert J, Wolf E, Odenwalder P, Bessonov S, Will CL, Urlaub H, Luhrmann R (2011) Semi-quantitative proteomic analysis of the human spliceosome via a novel two-dimensional gel electrophoresis method. Mol Cell Biol 31: 2667-2682.

Behzadnia N, Golas MM, Hartmuth K, Sander B, Kastner B, Deckert J, Dube P, Will CL, Urlaub H, Stark H, Luhrmann R (2007) Composition and three-dimensional EM structure of double affinity-purified, human prespliceosomal A complexes. EMBO J 26: 1737-1748

Benkert, P, Biasini, M, Schwede, T (2011) Toward the estimation of the absolute quality of individual protein structure models Bioinformatics 27:343-50

Bessonov S, Anokhina M, Will CL, Urlaub H, Luhrmann R (2008) Isolation of an active step I spliceosome and composition of its RNP core. Nature 452: 846-850

Cvitkovic I, Jurica MS (2012) Spliceosome Database: a tool for tracking components of the spliceosome. Nucleic Acids Res doi: 10.1093/nar/gks999

Deckert J, Hartmuth K, Boehringer D, Behzadnia N, Will CL, Kastner B, Stark H, Urlaub H, Luhrmann R (2006) Protein composition and electron microscopy structure of affinity-purified human spliceosomal B complexes isolated under physiological conditions. Mol Cell Biol 26: 5528-5543

Dosztányi Z, Mészáros B and Simon I (2009) ANCHOR: web server for predicting protein binding regions in disordered proteins Bioinformatics 25(20): 2745-2746.

Galej W., Oubridge C., Newman A.J., Nagai K. Crystal structure of Prp8 reveals active site cavity of the spliceosome Nature (2013) doi:10.1038/nature11843

Jmol: an open-source Java viewer for chemical structures in 3D. http://www.jmol.org/

Jones DT (1999) Protein secondary structure prediction based on position-specific scoring matrices. J. Mol. Biol. 292: 195-202.

Jurica MS, Moore MJ (2003) Pre-mRNA splicing: awash in a sea of proteins. Mol Cell 12: 5-14

Korneta I, Bujnicki JM (2011) Intrinsic disorder in the human spliceosomal proteome submitted

Korneta I, Magnus M, Bujnicki JM (2011) Structural Bioinformatics of the Human Spliceosomal Proteome Nucleic Acids Research (submitted)

Magrane, M. and Consortium, U. (2011) UniProt Knowledgebase: a hub of integrated protein data. Database (Oxford), 2011, bar009.

Pawlowski M, Gajda MJ, Matlak R, Bujnicki JM. (2008) MetaMQAP: a meta-server for the quality assessment of protein models BMC Bioinformatics 9:403

Rice P, Longden I and Bleasby A (2000) EMBOSS: The European Molecular Biology Open Software Suite Trends in Genetics 16(6): 276-277

Waterhouse AM, Procter JB, Martin DMA, Clamp M and Barton GJ (2009) Jalview Version 2 - a multiple sequence alignment editor and analysis workbench Bioinformatics 25 (9) 1189-1191