Ensemble evaluation
The server is designed to analyze structural ensembles generated to represent the internal dynamics of proteins (see e.g. Ángyán & Gáspári 2013 and references therein). Such ensembles are expected to reflect NMR-derived experimental parameters in an ensemble-averaged manner, i.e. the ensemble as a whole corresponds to the parameters better than any of its constituent conformers, whereas the diversity of the conformations is related to the internal motions observed at a given time scale. For example, S2 general order parameters derived from heteronuclear relaxation measurements can be used to restrain MD simulations to generate ensembles reflecting internal dynamics at the ps-ns time scale (Best & Vendruscolo 2004, Lindorff-Larsen et al. 2005, Richter et al. 2007). The aim of the CoNSEnsX approach is to provide a convenient and standardized way to evaluate such ensembles. The server reports the correspondence of the ensemble to each NMR parameter separately and does not yield a single quality measure. The justification for this that the availability and reliability of different types of parameters varies from case to case, and the relevance and usability of an ensemble should be decided based on this information as well as the purpose of its generation (i.e. what biological phenomenon was intended to be addressed with it).
The server is designed to analyze as many types of parameters as possible from those supplied with the BMRB-format NMR parameter file (parsing is performed using a Python3 port of the NMRPyStar package). Currently the server supports the following parameters:
-
Chemical shifts - these are back-calculated with the original SHIFTX method (Neal et al. 2003) due to its speed. Chemical shift types currently supported by the server interface are: CA, CB, HA, amide H, and amide N.
-
J-couplings - supported 3J couplings are those that can be calculated from the phi dihedral angle and for which Karplus parameters are available (
3JHNHɑ,
3JHɑC,
3JHNCβ,
3JHNC
). Three published Karplus parameter sets can be chosen (Wang & Bax 1996, Hu & Bax 1997, Habeck, Rieping & Nilges 2005).
-
Residual dipolar couplings - these are back-calculated using steric PALES (Zweckstetter & Bax, 2000). Multiple RDC sets in the BMRB file are evaluated separately. By default, each conformer is separately oriented by SVD to obtain the best correspondence to experimental data. This approach takes into account that the orientation of the protein can be conformer-dependent (e.g. Louhivuori et al. 2007, Salvatella, Richter & Vendruscolo 2008, Montalvao, De Simone & Vendruscolo 2012 etc.). It is possible to turn off SVD fit, in which case a steric alignment will be estimated for each conformer, but it is not yet possible to use separate media for distinct RDC experiments.
-
S2 order parameters - backbone and side-chain order parameters are supported. The server can perform the superposition of the supplied conformers for the range of residues specified using the corresponding module of ProDy (Bakan, Meireles & Bahar 2011). By default, no superposition is done (i.e. the server implicitly assumes that conformers are already superimposed).
-
NOE distance restraints - an ensemble-based check of NOE distances is performed using r-6 averaging for all possible
1H-1H distances for restraints of any type of ambiguity and optionally r-6 or r-3 averaging between conformers. In addition, a PRIDE-NMR (Ángyán et al. 2008) analysis is done on all conformers. Please note that the result of distance analysis might be different from that obtained with other methods due to different averaging.
The server reports the correlation and RMSD for each parameter as well as Q-factor for RDCs. In addition, a correlation plot, a per-residue correspondence plot and a plot comparing the correlation of the ensemble as a whole vs. the individual conformers and the average of per-conformer correlations.
Sub-ensemble selection
The CoNSEnsX server is capable of selecting a sub-ensemble with the best match to selected experimental data. Selection can be initiated after a round of evaluation by specifying the parameters to be included in the selection process along with their weights (on a scale of 0 to 10). This kind of selection is admittedly subjective but allows the user to discriminate between parameters on the basis of their reliability and importance for the actual task.
The selection algorithm currently implemented is a version of a greedy approach, starting from the single conformer best corresponding to the included parameters and gradually adding conformers requiring better correspondence whenever possible. A so-called overdrive parameter can be set to allow individual addition steps to yield worse correspondence than the previous ones - extending the ensemble by further members might yield an overall better correspondence even in such cases. The other main adjustable parameters are the minimum and maximum size of the final sub-ensemble to be returned as well as the measure by which the sub-ensembles will be evaluated (Pearson correlation, Q-factor or RMSD). The output is the correspondence of the parameters to the full ensmeble and the selected sub-ensemble as well as the sub-ensemble as a downloadable PDB file.
Test data set
The test data set provided for the server is based on an ensemble calculated for the N-terminal SH3 domain of the DRK protein (PDB entry 2A36, Bezsonova et. al 2005) using standard molecular dynamics simulations. The test ensemble is a subsection of the ‘reference ensemble’ described in the CoNSEnsX+ paper (specifically, it contains 12 in addition to the ‘constricted ensemble’ to allow testing of the selection feature). The NMR parameter set provided contains backbone S2 order parameters, chemical shifts and with five RDC sets calculated from the ensemble. More details can be found in the CoNSEnsX+ paper.
Limitations
The server currently supports ensembles up to 1000 conformers.
Versions & availability
The current CoNSEnsX version is a complete redesign of the original one. The source code is free and available at GitHub
.
The first version of the server is still available at conensx.chem.elte.hu and was described in: Ángyán et al. BMC Strut. Biol. 2010, 10:39.
Tools & tips
Several scripts for conversion between different formats are provided here in the hope that they might be useful. They are simple scripts and might not cover all issues arising during format conversion. They can be freely used and modified but come with absolutely no warranty. The 'disre' format is introduced as a format highly similar to the distabce restraint format in GROMACS topology files, with all atoms explicitly named and atom pairs corresponding to the same restraint forming a group.
We recommend that before uploading data to the CoNSEnsX server, the user goes through the following steps:
-
Check that the atom naming convention is the same for all three files. To get a glimpse of possible mismatches, you can have a look at https://www.bmrb.wisc.edu/ref_info/atom_nom.tbl or invoke ‘atomconverter_pdb-pl -h’ that lists the same table with several additional formats added.
-
Perform the necessary conversions. We recommend to convert the PDB ensemble file (‘atomconverter_pdb.pl’) as this is the most likely to deviate from the nomenclature used by NMR processing software and is also the most robust to convert (its strict format makes recognition of atom names straightforward, making easy to spot unsuccessful conversions).
-
We provide simple tools to convert DIANA/DYANA .upl and X-PLOR/CNS .tbl distance restraint files into an intermediate format ‘disre’. By default, the converters ‘upl2disre.pl’ and ‘tbl2disre.pl’ will dereference all pseudoatoms into existing ones with the aid of the (already name-converted) PDB file supplied (e.g. ALA HB# to HB1, HB2, HB3 etc.). This intermediate format can then be converted to NMR-STAR format (v2 mr file format in the PDB web site) by ‘disre2str.pl’.
- All scripts below are provided on an ‘as is’ basis, they might not be error-free but they are provided in the belief that they might be useful. Note that the scripts are not officially supported by BMRB. They can be freely modified for research purpose (usage information invoked with -h):
References
- Ángyán AF, Gáspári Z: Ensemble-based interpretations of NMR structural data to describe protein internal dynamics. Molecules 2013, 18:10548-10567
- Ángyán AF, Szappanos B, Perczel A, Gáspári Z: CoNSEnsX: an ensemble view of protein structures and NMR-derived experimental data. BMC Strut. Biol. 2010, 10:39
- Ángyán AF, Perczel A, Pongor S, Gáspári Z: Fast protein fold estimation from NMR-derived distance restraints. Bioinformatics 2008, 24:272-275.
- Bakan A, Meireles LM, Bahar I: ProDy: Protein dynamics inferred from theory and experiments. Bioinformatics 2011, 27:1575-1577.
- Batta G, Barna T, Gáspári Z, Sándor S, Kövér KE, Binder U, Sarg B, Kaiserer L, Chhillar AK, Eigentler A, Leiter É, Hegedüs N, Pócsi I, Lindner H, Marx F: Functional aspects of the solution structure and dynamics of PAF - a highly-stable antifungal protein from Penicillium chrysogenum. FEBS J 2009, 276:2875-2890.
- Best RB, Vendruscolo M: Relation between native ensembles and experimental structures of proteins. Proc Natl Acad Sci USA 2006, 103:10901-10906.
- Bezsonova, I, Singer, A, Choy, WY, Tollinger, M, Forman-Kay, JD (2005). Structural comparison of the unstable drkN SH3 domain and a stable mutant. Biochemistry, 44, 47:15550-60.
- Habeck M, Rieping W, Nilges M. Bayesian estimation of Karplus parameters and torsion angles from three-bond scalar coupling constants. J Magn Reson 2005 177:160-165.
- Hu J-S, Bax A. Determination of 𝜙 and 𝛸1 angles in proteins from 13C-13C three-bond J couplings measured by three-dimensional heteronuclear NMR. How planar is the peptide bond? J. Am. Chem. Soc. 1997, 119:6360-6358.
- Lindorff-Larsen K, Best RB, DePristo MA, Vendruscolo M: Simultaneous determination of protein structure and dynamics. Nature 2005, 433:128-132.
- Louhivuori M, Otten R, Lindorff-Larsen K, Annila A: Conformational fluctuations affect protein alignment in dilute liquid crystal media. J Am Chem Soc 2006, 128:4371-4376.
- Montalvao RW, De Simone A, Vendruscolo M: Determination of structural fluctuations of proteins from structure-based calculations of residual dipolar couplings. J Biomol NMR 2012, 53:281-292.
- Neal S, Nip AM, Zhang N, Wishart DS: Rapid and accurate calculation of protein 1H, 13C and 15N chemical shifts. J Biomol NMR 2003, 26:215-240.
- Richter B, Gsponer J, Várnai P, Salvatella X, Vendruscolo M: The MUMO (minimal under-restraining minimal over-restraining) method for the determination of native state ensembles of proteins. J Biomol NMR 2007, 37:117-135.
- Salvatella X, Richter B, Vendruscolo M: Influence of the fluctuations of the alignment tensor on the analysis of the structure and dynamics of proteins using residual dipolar couplings. J Biomol NMR 2008, 40:71-81.
- Wang AC, Bax A: Determination of the backbone dihedral angles φ in human ubiquitin from reparametrized Karplus equations. J Am Chem Soc 1996, 118:2483-2494.
- Zweckstetter M, Bax A: Prediction of sterically induced alignment in a dilute liquid crystalline phase: aid to protein structure determination by NMR. J Am Chem Soc 2000, 122:3791-3792