FREE LUNCH
A METHOD TO PHASE AT RESOLUTION HIGHER
THAN THE EXPERIMENTAL ONE
References: Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C. & Siliqi, D. (2005). Acta Cryst. D61, 556-565.
Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C. & Siliqi, D. (2005). Acta Cryst. D61, 1080-1087.
Contact: Liberato.Decaro@ic.cnr.it
INTRODUCTION
Shortage of the experimental data is usual in macromolecular crystallography, but may also occur for small molecules when the diffraction sample is of poor quality. A few attempts have been made to extrapolate experimental data beyond the observed range: Karle & Hauptman (1964), Seeman et al. (1976), Langs (1998). All these techniques aim at improving the estimates of the diffraction moduli beyond or behind the resolution limit of the experimental data. Our new procedure, called free lunch as kindly suggested by George Sheldrick, is able to estimate both moduli and phases of non-measured reflections via electron density modifications (EDM) techniques. The free lunch can be applied to the following three typical situations in macromolecular crystallography:
1) ab initio phasing: RESobs in the interval 1.5-1.0 Å, an approximated electron density available, with MPEobs in the range (25°, 60°).
2) SAD-MAD, SIR-MIR, SIRAS-MIRAS phases: RESobs in the interval 2.8-1.5Å, an approximated electron density available (e.g., after the application of EDM procedures), with MPEobs in the range (40°, 65°).
3) ab initio phasing, RESobs in the interval 1.5-1.0Å, no phase information available.
In all the cases the ideal extrapolation procedure is expected to reduce the phase error of the measured reflections, to provide sensible estimates (in modulus and phase) for some additional reflections behind and beyond RESobs, and to increase the interpretability of the final electron density map.
AN UNIDIMENSIONAL EXAMPLE
Let us consider a simple
one-dimensional structure, with a = 10Å, containing two Mg, one O, one N and two
C. The exact distribution of the electron density (say ρtrue) is
represented in figure by a black line, sampled at 120 grid points. Suppose that
the correct molecular model has been obtained via experimental data with
resolution up to 0.5Å: their Fourier transform will practically coincide with ρtrue.
Let us now suppose that the available data resolution is 1.7 Å (let {}1.7 be the set of measured reflections) and that
the information provided by the experimental data, combined with some stereo
chemical prior information, leads us to the same molecular model obtained from
the data at 0.5Å resolution. The best electron density distribution we can
obtain by using data truncated at 1.7Å (say ρ1.7), is that using {
}1.7 and {φtrue }1.7, which
is shown in figure by a red line. ρ1.7 correctly locates the two Mg
atoms, shows very faint peaks connected with the N, O and C1 sites,
but it has a minimum in the C2 position, and presents a region with
negative electron density.
We now verify if a simple EDM algorithm, based on the atomicity and on the positivity of the electron density, can improve the interpretability of ρ1.7, even at expenses of the phase correctness. We use the following algorithm: at the j-th cycle the electron density is modified according to
The first condition applies the
positivity criterion, the second makes the atomic electron densities sharper, to
contrast the effect of the resolution limit. After 15 cycles
the resulting electron density is represented by the blue
curve in figure: it does not show the missed C2 atom, and is a rather
distorted representation of ρtrue. Increasing the number of cycles
increases the overall distortion. Let us now perform 15 cycles
by combining EDM with free lunch. In each half-cycle
the electron density is modified, and, by Fourier inversion,
moduli and phases are extrapolated up to 1.0Å: the initial ρ map is ρ1.7,
calculated from measured moduli and true phases. In each half-cycle
the electron density is calculated by using measured moduli
and current phases for reflections up to RESobs=1.7,
calculated moduli and current phases for the extrapolated reflections. The final
electron density distribution is shown in figure by the green curve. Its
comparison with ρtrue suggests that the new procedure: a) produces a
higher resolution map (peaks are more resolved than in ρ1.7); b)
leads to an electron density much more interpretable in terms of atomic
positions; c) shows maxima in correspondence with the sites of all the atoms,
and, in particular, correctly locates the missed C2 atom; d) produces
false but low intensity peaks.
ρtrue (black line) and ρ1.7 (red line) are sampled on 120 grid points. The electron density produced
after 15 cycles of EDM is represented by the blue line, and the electron density obtained after 15 cycles
of EDM+free lunch is the green line.
THE ALGORITHM
The procedure is performed in two
steps, each one including a number of cycles which may be represented by the
typical symbol . In the first step the extrapolated reflections are
progressively added to the measured ones, while the criteria governing the EDM
process are kept fixed. In the second step further EDM cycles are performed with
the number of considered reflections kept constant, and EDM criteria
continuously varied. At the end of the procedure the resolution limit of
observed and extrapolated reflections is
.
1) THE AB-INITIO CASE
Let us consider the first step of the
procedure, as designed for
ab initio phasing (RESobs in the interval 1.5-1.0Å,
an approximate electron density available). In
accordance with Langs (1998), in the half-cycle
we found advantageous to extrapolate all non-measured
reflections in one step from RESobs to RESext,
rather than to increase the extrapolation resolution gradually. However not all
the extrapolated reflections are used in the half cycle
, but only a percentage of them, which increases with the
cycle number (it ranges from 10% to 75% of the number of measured reflections).
The selection of the extrapolated reflections is performed on the basis of their
moduli
, estimated by map inversion. In fact largest modulus
reflections strongly influence the quality of the electron density map, can be
phased with larger accuracy and are able to pilot the subsequent extrapolation.
On the other hand, an excessive number of actively used extrapolated reflections
could corrupt the initial observed reflections phase set, so that it will not
exceed 75% of the number of observed reflections.
Other features of the procedure are:
§
In the half cycle
only a fraction of
corresponding to 10% of the volume occupied by the protein is
used in each map inversion.
§
The
values obtained after each map inversion are rescaled
according to the distribution of normalized structure factors expected for a
random atom structure.
§
in the half cycle
the
Fourier coefficients are
for
observed reflections, while for the extrapolated ones they are estimated from
the
values as described in §4 of Caliandro et al. (2005b).
§
A Sim-like weight is associated to each
reflection:
for an observed one and
for an extrapolated reflection.
k is an empirical constant set to 0.5.
§
The distribution of weights is dynamically
modified during the procedure. Specifically, the observed reflection weights are
raised to power using the factor : this ratio tends to decrease with the cycle number as long
as extrapolated reflections with lower moduli are fed in the procedure and it is
mostly lower than one. This operation, which allows to reduce the impact of the
newcomer extrapolated reflections onto those already phased, is performed every
two cycles.
§
A substantial gain in efficiency is obtained by
calculating the molecular envelope in the half cycle
and by using it as a mask in the following half cycle (Wang,
1985; Leslie, 1987). The calculation includes all the reflections phased in the
current cycle (hence also the extrapolated ones, particularly those at very low
resolution) and is performed using a sphere of varying radius.
In the second step of the procedure
the fraction of used in each map inversion varies from 10% to 30% of the
protein volume, depending on the cycle number. Furthermore:
§ to reduce the impact of the background, the pixel intensity is halved if it is below one standard deviation of the whole electron density map;
§ to limit the overvalue of large moduli reflections, every two cycles the map is truncated to a threshold value which ranges from 5 to 10 times the standard deviation, depending on the cycle number.
§ the molecular envelope is not applied and the exponent used for the modification of weights is decreased from its last value in the first step to 0.5, to enhance the contribution of lower weight reflections.
2) THE SAD-MAD, SIR-MIR, SIRAS-MIRAS CASE
In this case, owing to the lower experimental data resolution (RESobs supposed in the interval 2.8-1.5Å), the risk that extrapolated reflections can corrupt the starting phase set is high. To overcome this tendency, current phases of the observed reflections are combined with their “experimental” values, using a relative weight which progressively goes in favour of the current phases. Since the combination can slow down the convergence of the free lunch, it is performed every two cycles. Additional features are:
§ the number of cycles in the first step has been reduced (more cycles do not increase the quality of the electron density map);
§ the molecular envelope is used also in the second step of the procedure for structures with RESobs>2.0, as additional constrain for the phasing process.
3) THE AB-INITO CASE WITH NO PHASE INFORMATION
The method may also be applied during the ab-initio phasing process, by including cycles of free lunch as described in case 1), among the standard EDM cycles performed for phase extension and refinement. As a result, the crystal structure solution can be reached even in cases in which the use of the experimental data fails. The free lunch proved to be particularly useful for structures having a substantial amount of non-measured reflections below the experimental resolution limit (Caliandro et al. 2005b).
LIMITATIONS
The efficiency of the free lunch procedure depends on the experimental resolution limit (RESobs) and on the resolution value one want to reach (RESext) (Caliandro et al. 2005a). It is very efficient at quasi-atomic resolution (RESobs between 1.2 and 1.6Ǻ) by extrapolationg up to atomic resolution (RESext=1.0Ǻ). At lower resolution, limited improvements have been achieved for structures with RESobs up to 2.4Ǻ, but only using a final resolution lower than atomic (values of RESext between 1.2Ǻ and 1.8 Ǻ have been used). In default, il milione calculates the optimal value of RESext, which depends on the experimental resolution, on the percentage of missing reflections and on the space group.
REFERENCES
Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo. & Siliqi, D. (2005a). Acta Cryst. D61, 556-565.
Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo. & Siliqi, D. (2005a). Acta Cryst. D61, 1080-1087.
Karle, J. and Hauptman, H. (1964). Acta Cryst., 17, 392-396.
Langs, D.A. (1998). Acta Cryst. A54, 44-48.
Leslie, A.G.W. (1987). Acta Cryst. D58, 1442-1447.
Seeman, N.C., Rosenberg, J.M., Suddath, F.L., Kim, J.J.P. & Rich, A. (1976). J. Mol. Biol. 104, 109-144.
Wang, B.C. (1985). Methods Enzymol. 115, 90-112.