Data for "Towards a structurally resolved human protein interaction network"

Abstract

All cellular functions are governed by complex molecular machines that assemble through protein-protein interactions. Their atomic details are critical to the study of their molecular mechanisms but fewer than 5% of hundreds of thousands of human interactions have been structurally characterized. Here we test the potential and limitations of recent progress in deep-learning methods using AlphaFold2 to predict structures for 65,484 human interactions. Higher confidence models are enriched in interactions supported by affinity based methods and can be orthogonally confirmed by spatial constraints defined by cross-link data. We selected 3137 high confidence models from which we identify interface residues harbouring disease mutations, suggesting potential mechanisms for pathogenic variants. We find groups of interface phosphorylation sites that show patterns of co-regulation across conditions, suggestive of coordinated tuning of multiple interactions as signalling responses. Finally, we provide examples of how the predicted binary complexes can be used to build larger assemblies. Accurate prediction of protein complexes promises to greatly expand our understanding of the atomic details of human cell biology in health and disease.

Data and resources

.
└── PrePrint at BioRxiv
└── FAIR-complaint data repository using figshare
└── Summary data for the both dataset
└── Predicted dimers from Hu.Map dataset
└── CSV file with structural properties for the Hu.Map dataset
└── Predicted dimers from HuRI dataset
└── Predicted dimers from HuRI dataset (google drive link)
└── CSV file with structural properties for the HuRI dataset
└── Predicted monomers from Hu.Map dataset
└── Predicted monomers from HuRI dataset
└── Predicted dimers from Random dataset
└── CSV file with structural properties for the Random dataset
└── Source code for this project

Additional data and resources

.
└── Source code for FoldDock project (used to generate models)
└── Rhe "Marks dataset"
└── All MSAs for the "Marks dataset"