REMO
A MOLECULAR REPLACEMENT PROGRAM
Reference: Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C., Mazzone, A.M. & Siliqi, D. (2006). J. Appl. Cryst. 39, 185-193.
Contact: Rocco.Caliandro@ic.cnr.it
INTRODUCTION
REMO is an extremely automated and user-friendly program for molecular replacement which searches for the fitting by rotating the weighted reciprocal lattice of the protein with respect to the calculated transform of the model structure, and uses fast Fourier transforms (FFT) translation functions for locating the molecules. In the rotation step the orientation of the model molecule is found by rotating the weighted reciprocal lattice of the protein with respect to the calculated transform of the model structure: the fitting is searched in the reciprocal space. The space group of the model structure is assumed to be the symmorphic variant of the protein space group. The oriented model molecule is located by using the correlation function coupled with a translation function calculated by FFT.
THE ROTATION STEP
REMO adopt the approach called Molecular Fourier Transform
method (MFT), which has been recently revisited by Rabinovich et al.
(1998): the structure factor calculation of the molecular model is calculated
only once, and the fitting is achieved by rotating the observed reciprocal
lattice with respect to the model lattice. The process involves only the indices,
and is therefore independent on the number of atoms in the model and in the protein
structure. We have developed an algebraic approach useful for the efficient use of
the MFT method, leading to the novel target function RFOM =
= max for the rotation search (Caliandro et al., 2009).
THE TRANSLATION STEP
REMO uses the classical T2 function of Crowther & Blow (1967) as corrected by Harada et al. (1981). The translation function can be written as:
which can be presented as a Fourier series:
with coefficients
where summation is over those h
for which .
We have modified the TF expression
by replacing the term by
, as suggested by Navaza (1994), and by excluding from the
summation the term with s1=s2. In the
practice the summation may be limited to the couples for which with s1>s2,
since one is interested only in one sense of the
vector. Since eq. (1) can be written in direct space as the
convolution between two Patterson functions, the use of the term
allows to: 1) subtract the contribution of false peaks
corresponding to overlapping symmetry-related models; 2) remove the origin peak
from the Patterson function of the protein structure and from the shifted
Patterson function of the model structure.
Only the peak position falling inside
the Cheshire group corresponding to the protein space group are considered as potential translation vectors.
The joint probability theoretical approach showed the criterion TFOM =
= max is effective to find the correct solution.
THE ALGORITHM
The algorithm used in REMO is described below per steps:
where resmin is the minimum resolution
adopted for calculations and <cell> is the average length of the
unit-cell basis vectors of the protein structure. The extension of the
orientation space to be explored is limited to the asymmetric region of the
rotation group (Hirshfeld, 1968).
RFOM =
where Mui
is the multiplicity of the reflection,
are computed and stored for each
grid point hmod. The calculation is performed by FFT of the
electron density of the model structure, generated from its coordinates in the
enlarged cubic cell (Agarwal, 1978). The isotropic temperature factors of the
model structure atoms are rescaled so as to be compatible with the overall
isotropic temperature factor of the protein structure, estimated by the Wilson
plot (Wilson, 1942);
, where resmax is the maximum reflection resolution.
, suitably modified according to points 4) and 5), are constantly associated to the new indices
. On the other hand, the previously stored values of
are combined to form the quantity
. For each sampled rotation the following target function is calculated:
= max
(17)
and
. It is worthwhile noticing that the rotated indices
are real quantities, whereas the
are calculated at the nodes of the reciprocal lattice of the
model structure. This approximation is negligible provided an extremely fine
lattice is used: this justify our choice to take the cell length of the model
molecule as four times its maximum molecular dimension.
is used within a region of
, where ki depends on the angular interval spanned by the Euler angle
during the rotation search. Subsequent orientations falling
in the same region are rejected as members of the same cluster of rotational solutions.
= max
, where
and
THE NCS STEP
If the protein structure is expected to contain n>1 monomers in the asymmetric unit, a special procedure to locate the n model monomers one after the other is followed. For the first monomer all the previous steps are executed, though with looser thresholds for candidate solution selection. Then, the locally optimised candidate solution is combined with those selected after the rotational search to form candidate couples of solutions. The first monomer of the couple is kept fixed and the TF function for two independent models is applied to provide the position of the second monomer. To save computational time, for each couple only the peak in the TF map with the highest NCSFOM value is considered. The couple with highest NCSFOM value is assumed to be the best solution. In case the program supports more locally optimised solutions for the first monomer, the above process is repeated. If n>2, the procedure is iterated to form multiplets of monomers: the position and orientation of the previously located monomers is kept fixed and the multiplet with higher NCSFOM is assumed to be the best solution. When a statistical analysis on the observerd reflections reveals relevant pseudotranslation effects, the program searches for the position of two monomers, both having the same orientation.
SELECTION CRITERIA
The best
candidate solutions at the different stages of the MR procedure are selected by
means of the normalized variable ,
where FOM is the current target function (it is RFOM in the rotation step and TFOM in the translation step). Threshold values for the r variable have been determined to optimise the efficiency of the program and are used as default values. At the end of the MR run the relevant information about the selected solutions are written in output files: the coordinates of are written in PDB format, the reflection phases in MTZ format.
LIMITATIONS
One of REMO major drawbacks are the computational time spent in the NCS procedure and model refinement and some flatness in the statistical criteria (i.e. the TFOM coefficient) used to recognize the correct solutions. Both these weaknesses could be overcome by developments of the REMO approach, which are currently under investigation.
REFERENCES
Agarwal, R.C. (1978). Acta Cryst. A34, 791-809.
Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C., Mazzone, A.M. & Siliqi, D. (2006). J. Appl. Cryst. 39, 185-193.
Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C., Mazzone, A.M. & Siliqi, D. (2009). Acta Cryst. submitted.
Crowther, R.A. & Blow, D.M. (1967). Acta Cryst. 23, 544-548.
Harada, Y., Lifchitz, A., Berthou, J. & Jolles, P. (1981). Acta Cryst. A37, 398-406.
Hirshfeld, F.L. (1968). Acta Cryst. A24, 301-311.
Lattman, E.E. (1972). Acta Cryst. B28, 1065-1068.
Navaza, J. (1994). Acta Cryst. A50, 157-163.
Nelder, J.A. & Mead, R. (1965). Computer journal, 7, 308.
Rabinovich, D., Rozenberg, H. & Shakked, Z. (1998). Acta Cryst. D54,1336-1342.
Rowan, T. (1990). Functional Stability Analysis of Numerical Algorithms, Ph.D. thesis, Department of Computer Sciences, University of Texas at Austin.
Wilson, A.J.C. (1942). Nature, 150, 151.