The human structural proteome
We collected 7,185 PDB chains and 23,532 homology models from ModBase that cover about 90% of the human proteome. All models are gzipped, and PDB files have been modified so that their residue numbers are the same as their corresponding residue numbers on UniProt through SIFTS. Download the model index file and a zipped file containing all structural models down below.
Index file (3.7 MB): all_models_index.txt
Zipped models (1.34 GB): all_models.zip
PIVOTAL predictions for all VUS
We generated PIVOTAL predictions for all 143,293 VUS mutations. Along with the PIVOTAL predictions scores we also provide predictions by existing pathogenecity predictors including PolyPhen-2, SIFT, PROVEAN and CADD. The file containing all predictions can be downloaded below. In addition, we also include features, including G scores as well as JS divergence and maximum SCA correlation, in the file.
Prediction file (25.9 MB): all_vus_predictions.txt
Pre-calculated G scores
We calculated G scores from known disease mutations from HGMD and ClinVar, evolutionary conservation (JS divergence), and co-evolution (maximum SCA correlation). In addition, we calculated disease-specific G scores for diseases with more than 50 known mutations. These data can be downloaded below as pickle files. For each model we also calculated a p-value cutoff at 5% FDR and annotated the statistical significance of each residue.
From disease mutations (123 MB): disease_mut_g_all_model_fdr_0.05.pkl
From evolutionary conservation (460 MB): js_g_all_model_fdr_0.05.pkl
From co-evolution (460 MB): sca_g_all_model_fdr_0.05.pkl
Disease-specific G scores (1.88 GB): disease_specific_g_all_model_fdr_0.05.pkl