Bioinformatics for protein sequence, structure and function

Homology modeling of protein structures

Homology modeling involves taking a known sequence with an unknown structure and mapping it against a known structure of one or several similar (homologous) proteins. It would be expected that two proteins of similar origin and function would have reasonable sequence similarity. One can then attempt to say that a conserved region of sequence that is, say for example, an alpha-helix in the known protein, is probably the same in the unknown's sequence.

Basic procedures utilized in Most homology modeling:

Obtaining the sequence(s) and checking with an alignment program, such as BLAST or FASTA to ensure relatedness and homology
Finding PDB data for known structures (NMR data can also help, but a 3-D coordinate map is necessary)
Multiple sequence alignments. (if more than one known is involved, sometimes the knowns are aligned together, then the unknown sequence aligned with the group; this helps ensure better domain conservation) Analysis of alignments; gap deletions and additions; secondary structure weighting
Structure calculation
Model refinement

The main difference between the different comparative modeling methods is in how the 3D model is calculated from a given alignment. The original and still the most widely used method is modeling by rigid body assembly. The method constructs the model from a few rigid bodies that include core regions, loops and side-chains, all of which are obtained from dissecting related structures. The assembly of the model involves calculating a framework, which is defined as the average of the template atoms in the conserved regions of the fold, and then fitting the rigid bodies on the framework. Another family of methods, modeling by segment matching, relies on approximate positions of conserved atoms from the templates to calculate the coordinates of other atoms. This is achieved by the use of a database of short segments of protein structure, energy or geometry rules, or some combination of these criteria. The third group of methods, modeling by satisfaction of spatial restraints, uses either distance geometry or optimization techniques to satisfy spatial restraints obtained from the alignment of the target sequence with similar templates of known structure. Some available software packages for comparative modeling are listed in Table 1. In addition to the methods for modeling the whole fold, numerous other techniques for predicting loops and side-chains on a given backbone have also been described. These methods can often be used in combination with each other and with comparative modeling techniques.

The modeller approach

Please read this paper

The ICM approach

ICM is more of a classical molecular modeling program than modeller. What separates it from other modeling approaches are (1) the energy function and (2) the representation of the proteins. The energy function contains terms to rapidly calculate a (good) approximation of entropy and hydrophobic effects. Below is the methods for ICM homology modeling outlined.

check and adjust you sequence-template alignment with the molecular graphics
build initial model by threading your sequence onto the template and patching insertions and deletions with results of the built-in database fragment search
find the lowest energy loop conformations by the global PBMC-loop optimization in all-atom representation, in the soft environment and with inclusion of solvation energy and entropy.
predict conformations of chain ends missing in the template.
predict side-chain conformations by fast continuous PBMC free energy optimization.

Arne Elofsson

Last modified: Wed Oct 27 15:47:19 CEST 1999

Arne Elofsson Stockholm Bioinformatics Center, Department of Biochemistry, Arrheniuslaboratoriet Stockholms Universitet 10691 Stockholm, Sweden	Tel: +46-(0)8/161553 Fax: +46-(0)8/158057 Hem: +46-(0)8/6413158 Email: arne@sbc.su.se WWW: /~arne/