A Deep-learning based method for the prediction of Residue solvent Exposure starting from protein sequence

About this method

DeepREx-WS is a method for performing a binary classification of residues based on their solvent accessibility, starting only from the sequence of the target protein.
The input sequence is firstly aligned using the HHblits program in order to produce a Multiple Sequence Alignment. This step is necessary for encoding each residue, that is then represented with 71 features including: one-hot encoding of the residue type (20); sequence profile computed directly from the MSA (21); sequence profile, emission and transition probabilities and sequence diversity features (neff) extracted directly from the output of HHblits (30).
The architecture of our model is composed of three Bidirectional Long-Short Term Memory (BLSTM) layers followed by a dense time-distributed layer. The first part processes the input protein, finding patterns and connections between residues even far away from one another in the sequence. The second part is then used to reconstruct a proper output for each residue, a real number between 0 and 1 that we can use to perform the final classification. If the output is lesser than 0.5, we predict the residue to be Buried, meaning that it has an RSA value below 20%, otherwise we predict it to be Exposed. Moreover, we can use that same output as an index of the reliability of the prediction, knowing that values closer to 0.5 are less accurate than those closer to 0 or 1.
After a rigorous training session using a 10-fold cross-validation to decide the hyper-parameters of the best model, we then tested its performance on a blind test set of 200 proteins that were left out from the training. The results below show that DeepREx performs at the state of the art, and the fact that its performance does not drop on the blind test is a good index of its robustness when used on new data.

Score Cross-Validation Blind Test
Accuracy (Q2) 0.81 0.82
F1 Score 0.81 0.82
Matthews Correlation Coefficient (MCC) 0.63 0.63

How to cite

Manfredi, M., Savojardo, C., Martelli, P.L., Casadio, R. (2021) DeepREx-WS: A web server for characterising protein–solvent interaction starting from sequence. Computational and Structural Biotechnology Journal, Volume 19, Pages 5791 - 5799.
doi:10.1016/j.csbj.2021.10.016 PMID: 34765094