Home

 

FREE LUNCH

A METHOD TO PHASE AT RESOLUTION HIGHER

THAN THE EXPERIMENTAL ONE

 

References:      Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C. & Siliqi, D. (2005). Acta Cryst. D61, 556-565.

Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo, C. & Siliqi, D. (2005). Acta Cryst. D61, 1080-1087.

Contact:           Liberato.Decaro@ic.cnr.it

 

INTRODUCTION

Shortage of the experimental data is usual in macromolecular crystallography, but may also occur for small molecules when the diffraction sample is of poor quality. A few attempts have been made to extrapolate experimental data beyond the observed range: Karle & Hauptman (1964), Seeman et al. (1976), Langs (1998). All these techniques aim at improving the estimates of the diffraction moduli beyond or behind the resolution limit of the experimental data. Our new procedure, called free lunch as kindly suggested by George Sheldrick, is able to estimate both moduli and phases of non-measured reflections via electron density modifications (EDM) techniques. The free lunch can be applied to the following three typical situations in macromolecular crystallography:

1)                  ab initio phasing: RESobs in the interval 1.5-1.0 Å, an approximated electron density available, with MPEobs in the range (25°, 60°).

2)                  SAD-MAD, SIR-MIR, SIRAS-MIRAS phases: RESobs in the interval 2.8-1.5Å, an approximated electron density available (e.g., after the application of EDM procedures), with MPEobs in the range (40°, 65°).

3)                  ab initio phasing, RESobs in the interval 1.5-1.0Å, no phase information available.

In all the cases the ideal extrapolation procedure is expected to reduce the phase error of the measured reflections, to provide sensible estimates (in modulus and phase) for some additional reflections behind and beyond RESobs, and to increase the interpretability of the final electron density map.

AN UNIDIMENSIONAL EXAMPLE

Let us consider a simple one-dimensional structure, with a = 10Å, containing two Mg, one O, one N and two C. The exact distribution of the electron density (say ρtrue) is represented in  figure by a black line, sampled at 120 grid points. Suppose that the correct molecular model has been obtained via experimental data with resolution up to 0.5Å: their Fourier transform will practically coincide with ρtrue. Let us now suppose that the available data resolution is 1.7 Å (let {}1.7 be the set of measured reflections) and that the information provided by the experimental data, combined with some stereo chemical prior information, leads us to the same molecular model obtained from the data at 0.5Å resolution. The best electron density distribution we can obtain by using data truncated at 1.7Å (say ρ1.7), is that using {}1.7 and {φtrue }1.7, which is shown in figure by a red line. ρ1.7 correctly locates the two Mg atoms, shows very faint peaks connected with the N, O and C1 sites, but it has a minimum in the C2 position, and presents a region with negative electron density.

We now verify if a simple EDM algorithm, based on the atomicity and on the positivity of the electron density, can improve the interpretability of ρ1.7, even at expenses of the phase correctness. We use the following algorithm: at the j-th cycle the electron density is modified according to

The first condition applies the positivity criterion, the second makes the atomic electron densities sharper, to contrast the effect of the resolution limit. After 15 cycles  the resulting electron density is represented by the blue curve in figure: it does not show the missed C2 atom, and is a rather distorted representation of ρtrue. Increasing the number of cycles increases the overall distortion. Let us now perform 15 cycles  by combining EDM with free lunch. In each half-cycle  the electron density is modified, and, by Fourier inversion, moduli and phases are extrapolated up to 1.0Å: the initial ρ map is ρ1.7, calculated from measured moduli and true phases. In each half-cycle  the electron density is calculated by using measured moduli and current phases for reflections up to RESobs=1.7, calculated moduli and current phases for the extrapolated reflections. The final electron density distribution is shown in figure by the green curve. Its comparison with ρtrue suggests that the new procedure: a) produces a higher resolution map (peaks are more resolved than in ρ1.7); b) leads to an electron density much more interpretable in terms of atomic positions; c) shows maxima in correspondence with the sites of all the atoms, and, in particular, correctly locates the missed C2 atom; d) produces false but low intensity peaks.

 

 

   ρtrue (black line) and ρ1.7 (red line) are sampled on 120 grid points. The electron density produced

after 15 cycles of EDM is represented by the blue line, and the electron density obtained after 15 cycles

of EDM+free lunch is the green line.

 

 

THE ALGORITHM

The procedure is performed in two steps, each one including a number of cycles which may be represented by the typical symbol . In the first step the extrapolated reflections are progressively added to the measured ones, while the criteria governing the EDM process are kept fixed. In the second step further EDM cycles are performed with the number of considered reflections kept constant, and EDM criteria continuously varied. At the end of the procedure the resolution limit of observed and extrapolated reflections is .

1) THE AB-INITIO CASE

Let us consider the first step of the procedure, as designed for ab initio phasing (RESobs in the interval 1.5-1.0Å, an approximate electron density available). In accordance with Langs (1998), in the half-cycle  we found advantageous to extrapolate all non-measured reflections in one step from RESobs to RESext, rather than to increase the extrapolation resolution gradually. However not all the extrapolated reflections are used in the half cycle , but only a percentage of them, which increases with the cycle number (it ranges from 10% to 75% of the number of measured reflections). The selection of the extrapolated reflections is performed on the basis of their moduli , estimated by map inversion. In fact largest modulus reflections strongly influence the quality of the electron density map, can be phased with larger accuracy and are able to pilot the subsequent extrapolation. On the other hand, an excessive number of actively used extrapolated reflections could corrupt the initial observed reflections phase set, so that it will not exceed 75% of the number of observed reflections.

Other features of the procedure are:

§         In the half cycle  only a fraction of  corresponding to 10% of the volume occupied by the protein is used in each map inversion.

§         The  values obtained after each map inversion are rescaled according to the distribution of normalized structure factors expected for a random atom structure.

§         in the half cycle  the Fourier coefficients are  for observed reflections, while for the extrapolated ones they are estimated from the  values as described in §4 of Caliandro et al. (2005b).

§         A Sim-like weight is associated to each reflection:  for an observed one and  for an extrapolated reflection. k is an empirical constant set to 0.5.

§         The distribution of weights is dynamically modified during the procedure. Specifically, the observed reflection weights are raised to power using the factor : this ratio tends to decrease with the cycle number as long as extrapolated reflections with lower moduli are fed in the procedure and it is mostly lower than one. This operation, which allows to reduce the impact of the newcomer extrapolated reflections onto those already phased, is performed every two cycles.

§         A substantial gain in efficiency is obtained by calculating the molecular envelope in the half cycle  and by using it as a mask in the following half cycle (Wang, 1985; Leslie, 1987). The calculation includes all the reflections phased in the current cycle (hence also the extrapolated ones, particularly those at very low resolution) and is performed using a sphere of varying radius.

In the second step of the procedure the fraction of  used in each map inversion varies from 10% to 30% of the protein volume, depending on the cycle number. Furthermore:

§         to reduce the impact of the background, the pixel intensity is halved if it is below one standard deviation of the whole electron density map;

§         to limit the overvalue of large moduli reflections, every two cycles the map is truncated to a threshold value which ranges from 5 to 10 times the standard deviation, depending on the cycle number.

§         the molecular envelope is not applied and the exponent used for the modification of weights is decreased from its last value in the first step to 0.5, to enhance the contribution of lower weight reflections.

2) THE SAD-MAD, SIR-MIR, SIRAS-MIRAS CASE

In this case, owing to the lower experimental data resolution (RESobs supposed in the interval 2.8-1.5Å), the risk that extrapolated reflections can corrupt the starting phase set is high. To overcome this tendency, current phases of the observed reflections are combined with their “experimental” values, using a relative weight which progressively goes in favour of the current phases. Since the combination can slow down the convergence of the free lunch, it is performed every two cycles. Additional features are:

§         the number of cycles in the first step has been reduced (more cycles do not increase the quality of the electron density map);

§         the molecular envelope is used also in the second step of the procedure for structures with RESobs>2.0, as additional constrain for the phasing process.

3) THE AB-INITO CASE WITH NO PHASE INFORMATION

The method may also be applied during the ab-initio phasing process, by including cycles of free lunch as described in case 1), among the standard EDM cycles performed for phase extension and refinement. As a result, the crystal structure solution can be reached even in cases in which the use of the experimental data fails. The free lunch proved to be particularly useful for structures having a substantial amount of non-measured reflections below the experimental resolution limit (Caliandro et al. 2005b).

LIMITATIONS

The efficiency of the free lunch procedure depends on the experimental resolution limit (RESobs) and on the resolution value one want to reach (RESext) (Caliandro et al. 2005a). It is very efficient at quasi-atomic resolution (RESobs between 1.2 and 1.6Ǻ) by extrapolationg up to atomic resolution (RESext=1.0Ǻ). At lower resolution, limited improvements have been achieved for structures with RESobs up to 2.4Ǻ, but only using a final resolution lower than atomic (values of RESext between 1.2Ǻ and 1.8 Ǻ have been used). In default, il milione calculates the optimal value of RESext, which depends on the experimental resolution, on the percentage of missing reflections and on the space group.


 

REFERENCES

Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo. & Siliqi, D. (2005a). Acta Cryst. D61, 556-565.

Caliandro, R., Carrozzini,B., Cascarano, G.L., De Caro, L., Giacovazzo. & Siliqi, D. (2005a). Acta Cryst. D61, 1080-1087.

Karle, J. and Hauptman, H. (1964). Acta Cryst., 17, 392-396.

Langs, D.A. (1998). Acta Cryst. A54, 44-48.

Leslie, A.G.W. (1987). Acta Cryst. D58, 1442-1447.

Seeman, N.C., Rosenberg, J.M., Suddath, F.L., Kim, J.J.P. & Rich, A. (1976). J. Mol. Biol. 104, 109-144.

Wang, B.C. (1985). Methods Enzymol. 115, 90-112.

 

Home