Phasing Tools
According to circumstances, the ab-initio Sir2014 phasing process may apply Modern Direct Methods (MDM), Standard Direct Methods (SDM), VLD or Patterson procedures. Phase extension and refinement are achieved by Direct Space Refinement (DSR) techniques. To preserve its efficiency the program distinguishes different categories of structures: XSmall molecules (up to 5 atoms per asymmetric unit), Small (up to 80), Medium (up to 300), Large (up to 600), XLarge (up to 1000), XXLarge (up to 1750), Huge (no upper limit).
According to user preferences and to the presence or absence of heavy atoms, a different approach (Direct Methods, VLD or Patterson techniques) may be applied. The DSR procedure, mainly based on electron density modification (EDM) techniques, may be accomplished in different ways (it requires increasing computing times moving from Xsmall to Huge structures). We shortly describe the various tools the user may employ in the phasing process.
Modern Direct Methods (MDM) procedure
The approach is of multisolution type. It starts from random phases and applies a tangent like formula simultaneously using:
The phases obtained via the tangent step are then submitted to Direct Space Refinement, via EDM cycles and RELAX procedure.
The approach is sequential. Each trial starting set is first submitted to the tangent procedure and then to direct space procedures.
The program stops when a figure of merit is larger than a given threshold.
In Sir2014 the MDM procedure is the default choice for structures up to 300 non-hydrogen atoms in asymmetric unit.
Standard Direct Methods (SDM) procedure
Even in this case a multisolution approach is used; the phasing procedures involves, for each trial, the application of the tangent step to Nlarge reflections (those selected by the normalization process), starting from a subset of random phases. The P10 formula is applied and the most reliable negative quartets are actively used during the phasing process.
An early figure of merit (eFOM) is computed for each tangent trial: eFOM (Burla et al. 2003, 2004b) is expected to be a maximum for the most promising phase sets. Only the best trial solutions, sorted by eFOM, are submitted to the DSR procedure.
The VLD ( Vive la Difference) procedure
A new cyclic phasing algorithm which does not use Direct or Patterson methods has been developed: it is based only on properties of a new difference electron density and of the observed Fourier synthesis.
Random phases are assigned to the target structure factors F, an observed Fourier synthesis is calculated and suitably modified (2.5% of the pixels, those with the largest density are selected, the rest is set to zero) to obtain a starting random model. The Fourier inversion of such modified map provides the model structure factors Fp. A new type of difference Fourier synthesis (Burla, Caliandro et al., 2010) is calculated, which, suitably modified and inverted, provides the best estimates of the structure factors of the difference electron density (say Fq).
By definition Fp + Fq is the current best approximation of F, by which a new observed electron density is calculated. Cycles of electron density modifications are applied in order to provide, at the end, a new electron density from which a model density is derived as before described.
The procedure is cyclic and stops when a figure of merit suggests that the correct solution has been found (Burla, Giacovazzo et al., 2010, 2011). More recently (Burla, Carrozzini et al., 2011) the efficiency of the algorithm has been improved, by integrating the RELAX procedure (see below), and applied to proteins at atomic resolution.
As an example the screenshots of the crystal structure solution obtained using VLD is reported. In the small window the behaviour of the crystallographic residual during the VLD cycles.
Patterson deconvolution procedure
Sir2014 uses the Patterson approach (Bürger, 1959; Richardson & Jacobson, 1987; Sheldrick, 1992) mostly for the ab initio solution of large molecules and proteins.
The Patterson deconvolution techniques are based on the use of implication transformations, of the minimum superposition function (SMF) and on the superposition techniques. The procedure, described by Burla et al. (2004a) and Burla et al. (2006a,b), includes filtering algorithms (to eliminate the residual Patterson symmetry, to break off non-crystallographic symmetries produced by the deconvolution process, and to restate the non-centrosymmetric nature of the electron density map), EDM and VLD techniques (in the DSR procedure) for improving and extending the phases .
Several new algorithms like C-MAP and SNIP (Caliandro et al., 2014) have been devised and implemented in Sir2014. An Automated Model Building procedure (via Buccaneer, ARP/wARP or Phenix) of the final electron density map is now part of the procedure.
An example of Patterson approach pipeline using the P22 structure (PDB code 2ANV) follows:
The DSR procedure (Burla et al., 2003) is mainly based on s supercycles of electron density modification (EDM step), each constituted by t microcycles ρ→{φ}→ρ (see Fig. 2). The default values of s and t change with the structural complexity. The modification of the electron density map includes powering (Refaat & Woolfson, 1993) and the inversion of small negative domains (Burla, et al., 2003). n cycles of difference electron density modification (VLD step) have been implemented in DSR procedure, to improve the phasing process. New phases and normalized structure factor modules (Rc) are obtained by inversion of a changing percentage (up to 10% for small/medium-sized molecules or up to 40% for proteins) of the electron density map: such modules are rescaled by histogram matching with respect to the distribution of the observed ones (Ro). Proper weights are supplied to the new phases, appling the classical Sim-like scheme (Sim, 1959, 1960) or the popular σA scheme (Read, 1986; Caliandro et al., 2005c) for macromolecules.
Figure 2 - The Electron Density Modification cycle.
The molecular envelope of the protein (Wang, 1985; Leslie, 1987) is used, in Sir2014, as a mask in the EDM step, in order to improve its efficiency (Burla et al., 2003). The protein volume is estimated through the Matthews (1968) formula and the envelope is calculated, for each trial solution, from the current phases. The electron density map is modified by assigning different weights to pixels falling inside or outside the envelope, so tentatively depleting the intensities of the false peaks. The map is then inverted and the resulting phases may improve their values.
The whole DSR procedure could automatically be iterated for the same trial, restarting each time from the current phases (Burla et al., 2003). The number of total DSR iterations is automatically defined by the program, according to the structural complexity and to the experimental data resolution. This iterative process, although time consuming, allows to solve also resistant molecules (i.e. protein structures diffracting at non-atomic resolution). The DSR procedure is also integrated, at the end of the final iteration, with the RELAX procedure (for structures with experimental data up to 1.5Å resolution).
In order to improve the efficiency of the phasing process in cases where the experimental data have quasi-atomic resolution or poor completeness (e.i. macromolecular crystallography), the Free Lunch algorithm (Caliandro et al., 2005a,b) is actively used in Sir2014. From an approximate electron density map, the procedure, combined with classical EDM techniques, is able to extrapolate moduli and phases of unmeasured reflections, with resolution lower or higher than the experimental one, and to actively use them in the ab initio solving process. As a consequence, the phase estimates of the observed reflections are subsequently improved and the interpretability of the corresponding electron density map increases. Finally, the Free Lunch algorithm makes easier the recognition of the correct solution, by means of a suitable figure of merit (fFOM).
The problem of well oriented but misplaced molecular fragments may be solved by the RELAX procedure (Burla et al., 2002; Caliandro et al., 2007). Data and corresponding phases of a trial solution obtained in the correct space group are extended and refined in P1 and a reduced number of EDM cycles are performed with no space-group symmetry imposed. Some figures of merit enable the determination of the shift to apply in the correct space group in order to re-establish the original space group symmetry.
The RELAX procedure is automatically applied in a default run of Sir2014, even to Patterson phase sets.
The identification of the correct solution: the final Fom ( fFOM)
For small/medium-sized structures the correctness of a solution is assessed, at the end of the DSR process, by the crystallographic residual factor (Rf): if the final value of Rf is smaller than a given threshold (default value 25%) the program stops; otherwise, the program explores the next ranked phase set.
For large sized molecules and proteins the least squares are very time consuming: furthermore they cannot be applied to non-atomic resolution data. To recognise the correct solution a suitable figure of merit (fFOM) has been devised and applied at the end of the DSR process. Further details are quoted in Caliandro et al. (2014).
The correct solution should be identified by large values of fFOM.
Created with the Personal Edition of HelpNDoc: Easy EBook and documentation generator