Home

REMO

A MOLECULAR REPLACEMENT PROGRAM

 

Reference:        Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C., Mazzone, A.M. & Siliqi, D. (2006). J. Appl. Cryst. 39, 185-193.

Contact:           Rocco.Caliandro@ic.cnr.it

 

INTRODUCTION

REMO is an extremely automated and user-friendly program for molecular replacement which searches for the fitting by rotating the weighted reciprocal lattice of the protein with respect to the calculated transform of the model structure, and uses fast Fourier transforms (FFT) translation functions for locating the molecules. In the rotation step the orientation of the model molecule is found by rotating the weighted reciprocal lattice of the protein with respect to the calculated transform of the model structure: the fitting is searched in the reciprocal space. The space group of the model structure is assumed to be the symmorphic variant of  the protein space group. The oriented model molecule is located by using the correlation function coupled with a translation function calculated by FFT.

 

THE ROTATION STEP

REMO adopt the approach called Molecular Fourier Transform method (MFT), which has been recently revisited by Rabinovich et al. (1998): the structure factor calculation of the molecular model is calculated only once, and the fitting is achieved by rotating the observed reciprocal lattice with respect to the model lattice. The process involves only the indices, and is therefore independent on the number of atoms in the model and in the protein structure. We have developed an algebraic approach useful for the efficient use of the MFT method, leading to the novel target function RFOM =   = max   for the rotation search (Caliandro et al., 2009).

 

THE TRANSLATION STEP

REMO uses the classical T2 function of Crowther & Blow (1967) as corrected by Harada et al. (1981). The translation function can be written as:

which can be presented as a Fourier series:

with coefficients

where summation is over those h for which .

We have modified the TF expression by replacing the term  by , as suggested by Navaza (1994), and by excluding from the summation the term with s1=s2. In the practice the summation may be limited to the couples for which with s1>s2, since one is interested only in one sense of the  vector. Since eq. (1) can be written in direct space as the convolution between two Patterson functions, the use of the term allows to: 1) subtract the contribution of false peaks corresponding to overlapping symmetry-related models; 2) remove the origin peak from the Patterson function of the protein structure and from the shifted Patterson function of the model structure.

 

Only the peak position falling inside the Cheshire group corresponding to the protein space group are considered as potential translation vectors. The joint probability theoretical approach showed the criterion  TFOM = = max   is effective to find the correct solution.

 

THE ALGORITHM

The algorithm used in REMO is described below per steps:

  1.     the coordinates of the model molecule are orthonormalised and the maximum molecular dimension is calculated;
  2.     an orthogonal reciprocal lattice grid is generated whose direct space dimensions are chosen to be four times the maximum molecular dimension. The high-resolution limit of the lattice may be chosen by the user or automatically determined by the program so that the number of grid points belonging to a hemisphere of the reciprocal lattice is approximately equal to 5% of the number of independent observed reflections. The same resolution limit is applied to select the observed reflections to be used for the MR search;
  3.     the values   are computed and stored for each grid point hmod. The calculation is performed by FFT of the electron density of the model structure, generated from its coordinates in the enlarged cubic cell (Agarwal, 1978). The isotropic temperature factors of the model structure atoms are rescaled so as to be compatible with the overall isotropic temperature factor of the protein structure, estimated by the Wilson plot (Wilson, 1942);
  4.     the role of low-resolution reflections, dominated by the bulk solvent contribution, is reduced by weighting each reflection having resolution res by the factor , where resmax is the maximum reflection resolution.
  5.     the orientation space is sampled in terms of Lattman angles (Lattman, 1972) with an angular step given by

     

     

     

    where resmin is the minimum resolution adopted for calculations and <cell> is the average length of the unit-cell basis vectors of the protein structure. The extension of the orientation space to be explored is limited to the asymmetric region of the rotation group (Hirshfeld, 1968).

  6.     The indices hprot  of the protein structure, once transformed from the original protein unit cell to the enlarged cubic cell defined for the model structure, are systematically rotated via the matrix Mprot corresponding to each sampling point of the orientation space. The observed moduli , suitably modified according to points 4) and 5), are constantly associated to the new indices . On the other hand, the previously stored values of  are combined to form the quantity . For each sampled rotation the following target function is calculated:

    RFOM =   = max   (17)

     

     

     

    where Mui  is the multiplicity of the reflection,  and . It is worthwhile noticing that the rotated indices  are real quantities, whereas the  are calculated at the nodes of the reciprocal lattice of the model structure. This approximation is negligible provided an extremely fine lattice is used: this justify our choice to take the cell length of the model molecule as four times its maximum molecular dimension.

  7.     The orientations corresponding to the highest values of RFOM  are refined by performing a finer rotational search: an angular step of  is used within a region of , where ki depends on the angular interval spanned by the Euler angle  during the rotation search. Subsequent orientations falling in the same region are rejected as members of the same cluster of rotational solutions.
  8.     The remaining orientations are further on selected according to their new locally optimised values of RFOM  and the top ones are submitted to the translation search. The calculation of the TF function requires a reciprocal sphere (of reflections H) which includes that of the observed reflections h. To this aim we use the reciprocal lattice defined for the model structure in the rotation step. This trick allow a better management of memory and CPU resources and gives further support for our choice of such a large cell for the model structure. Only the TF peaks falling inside the unit cell of the Cheshire group corresponding to the protein structure space group (Hirshfeld, 1968) are considered. Instead of using the peak height, we found more effective to order and select the corresponding translation solutions according to  TFOM = = max , where  and
  9.     The top translation solutions undergo a rigid body local optimisation in a six-dimensional parameter space. This is achieved by a subspace-searching simplex method for unconstrained optimisation (Rowan, 1990), which is a generalisation of the downhill simplex method (Nelder & Mead, 1965). It has the advantage of requiring only function evaluations, not derivative; the number of function evaluations required for convergence typically increases only linearly with the problem size.
  10.  

    THE NCS STEP

    If the protein structure is expected to contain n>1 monomers in the asymmetric unit, a special procedure to locate the n model monomers one after the other is followed. For the first monomer all the previous steps are executed, though with looser thresholds for candidate solution selection. Then, the locally optimised candidate solution is combined with those selected after the rotational search to form candidate couples of solutions. The first monomer of the couple is kept fixed and the TF function for two independent models is applied to provide the position of the second monomer. To save computational time, for each couple only the peak in the TF map with the highest NCSFOM value is considered. The couple with highest NCSFOM value is assumed to be the best solution. In case the program supports more locally optimised solutions for the first monomer, the above process is repeated. If n>2, the procedure is iterated to form multiplets of monomers: the position and orientation of the previously located monomers is kept fixed and the multiplet with higher NCSFOM is assumed to be the best solution. When a statistical analysis on the observerd reflections reveals relevant pseudotranslation effects, the program searches for the position of two monomers, both having the same orientation.

     

    SELECTION CRITERIA

    The best candidate solutions at the different stages of the MR procedure are selected by means of the normalized variable ,

     

    where FOM is the current target function (it is RFOM in the rotation step and TFOM in the translation step). Threshold values for the r variable have been determined to optimise the efficiency of the program and are used as default values. At the end of the MR run the relevant information about the selected solutions are written in output files: the coordinates of are written in PDB format, the reflection phases in MTZ format.

     

    LIMITATIONS

    One of REMO major  drawbacks are the computational time spent in the NCS procedure and model refinement and some flatness in the statistical criteria (i.e. the TFOM coefficient) used to recognize the correct solutions. Both these weaknesses could be overcome by developments of the REMO approach, which are currently under investigation.


     

    REFERENCES

    Agarwal, R.C. (1978). Acta Cryst. A34, 791-809.

    Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C., Mazzone, A.M. & Siliqi, D. (2006). J. Appl. Cryst. 39, 185-193.

    Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C., Mazzone, A.M. & Siliqi, D. (2009). Acta Cryst. submitted.

    Crowther, R.A. & Blow, D.M. (1967). Acta Cryst. 23, 544-548.

    Harada, Y., Lifchitz, A., Berthou, J. & Jolles, P. (1981). Acta Cryst. A37, 398-406.

    Hirshfeld, F.L. (1968). Acta Cryst. A24, 301-311.

    Lattman, E.E. (1972). Acta Cryst. B28, 1065-1068.

    Navaza, J. (1994). Acta Cryst. A50, 157-163.

    Nelder, J.A. & Mead, R. (1965). Computer journal, 7, 308.

    Rabinovich, D., Rozenberg, H. & Shakked, Z. (1998). Acta Cryst. D54,1336-1342.

    Rowan, T. (1990). Functional Stability Analysis of Numerical Algorithms, Ph.D. thesis, Department of Computer Sciences, University of Texas at Austin.

    Wilson, A.J.C. (1942). Nature, 150, 151.

     

    Home