Home Page Local Home Page Text Assignments Schedule Links

Bioinformatics for protein sequence, structure and function


Homology modeling of protein structures

Homology modeling of protein structures

Homology modeling involves taking a known sequence with an unknown structure and mapping it against a known structure of one or several similar (homologous) proteins. It would be expected that two proteins of similar origin and function would have reasonable sequence similarity. One can then attempt to say that a conserved region of sequence that is, say for example, an alpha-helix in the known protein, is probably the same in the unknown's sequence.

Basic procedures utilized in Most homology modeling:

  1. Obtaining the sequence(s) and checking with an alignment program, such as BLAST or FASTA to ensure relatedness and homology
  2. Finding PDB data for known structures (NMR data can also help, but a 3-D coordinate map is necessary)
  3. Multiple sequence alignments. (if more than one known is involved, sometimes the knowns are aligned together, then the unknown sequence aligned with the group; this helps ensure better domain conservation) Analysis of alignments; gap deletions and additions; secondary structure weighting
  4. Structure calculation
  5. Model refinement

The main difference between the different comparative modeling methods is in how the 3D model is calculated from a given alignment. The original and still the most widely used method is modeling by rigid body assembly. The method constructs the model from a few rigid bodies that include core regions, loops and side-chains, all of which are obtained from dissecting related structures. The assembly of the model involves calculating a framework, which is defined as the average of the template atoms in the conserved regions of the fold, and then fitting the rigid bodies on the framework. Another family of methods, modeling by segment matching, relies on approximate positions of conserved atoms from the templates to calculate the coordinates of other atoms. This is achieved by the use of a database of short segments of protein structure, energy or geometry rules, or some combination of these criteria. The third group of methods, modeling by satisfaction of spatial restraints, uses either distance geometry or optimization techniques to satisfy spatial restraints obtained from the alignment of the target sequence with similar templates of known structure. Some available software packages for comparative modeling are listed in Table 1. In addition to the methods for modeling the whole fold, numerous other techniques for predicting loops and side-chains on a given backbone have also been described. These methods can often be used in combination with each other and with comparative modeling techniques.

The modeller approach

Please read this paper

The ICM approach

ICM is more of a classical molecular modeling program than modeller. What separates it from other modeling approaches are (1) the energy function and (2) the representation of the proteins. The energy function contains terms to rapidly calculate a (good) approximation of entropy and hydrophobic effects. Below is the methods for ICM homology modeling outlined.

  1. check and adjust you sequence-template alignment with the molecular graphics
  2. build initial model by threading your sequence onto the template and patching insertions and deletions with results of the built-in database fragment search
  3. find the lowest energy loop conformations by the global PBMC-loop optimization in all-atom representation, in the soft environment and with inclusion of solvation energy and entropy.
  4. predict conformations of chain ends missing in the template.
  5. predict side-chain conformations by fast continuous PBMC free energy optimization.

Arne Elofsson
Last modified: Wed Oct 27 15:47:19 CEST 1999
Arne Elofsson
Stockholm Bioinformatics Center,
Department of Biochemistry,
Arrheniuslaboratoriet
Stockholms Universitet
10691 Stockholm, Sweden
Tel: +46-(0)8/161553
Fax: +46-(0)8/158057
Hem: +46-(0)8/6413158
Email: arne@sbc.su.se
WWW: /~arne/