++++++++++
cctbx news
++++++++++

:Authors: Luc J. Bourhis, Ralf W. Grosse-Kunstleve, Paul D. Adams
:Contact: RWGrosse-Kunstleve@lbl.gov

:abstract:

  We describe recent developments of the
  Computational Crystallography Toolbox.

.. |sigma|  unicode:: U+003C3 .. GREEK SMALL LETTER SIGMA
.. contents:: Table of Contents

Preamble

In order to interactively run the examples scripts shown below, the
reader is highly encouraged to visit http://cci.lbl.gov/cctbx_build/
and to download one of the completely self-contained, self-extracting
binary cctbx distributions (supported platforms include Linux, Mac OS
X, Windows, IRIX, and Tru64 Unix). All example scripts shown below
were tested with cctbx build 2007_10_XX_XXXX.

In the following we refer to our articles in the previous editions of
this newsletter as "Newsletter No. 1", "Newsletter No. 2", etc. to
improve readability. The full citations are included in the reference
section.

============
Introduction
============

The `Computational Crystallography Toolbox` (cctbx,
http://cctbx.sourceforge.net/) is the open-source component of a
structure determination suite for macro-molecular crystallography
(Phenix, http://www.phenix-online.org/). However, the cctbx
was started with a code base developed in the context
of small-molecule crystallography, most notably SgInfo
(http://cci.lbl.gov/sginfo/). Although in recent years most new
developments were targeted towards macro-molecular work, the
small-molecule heritage has been carefully maintained in the core
modules of the cctbx project.  For example, all algorithms in the
``cctbx`` module (for the distinction between the *cctbx module*
and the *cctbx project* see Newsletter No. 1) work for all 230
crystallographic space groups and are routinely tested with symmetries
not found in macro-molecular crystals.

Recently, we have started a new ``smtbx`` module with algorithms
specifically for small-molecule work. This is very much work in
progress, but below we present some related developments. We give an
example of least-squares minimization with a target function and a
weighting scheme commonly used in the refinement of small molecules.
Another development that grew out of small-molecule context is the
handling of special position constraints. We also highlight important
new developments of the `Phil` system introduced in Newsletters No. 5
and No. 7. Finally, we give a brief summary of the transition of the
cctbx source code repository from CVS to Subversion (SVN).

=====================================================
Refinement tools for small-molecule crystallographers
=====================================================

Refinement against F\ :sup:`2`
------------------------------

Small-molecule crystallographers almost exclusively rely on the
minimization of a least-square target to refine a structure. The
cctbx has provided one for a long time but only for
F. Since F\ :sup:`2` refinement is very popular among small-molecule
crystallographers, a new target has recently been added to the
cctbx. The class ``cctbx.xray.unified_least_squares_residual`` provides
a single entry point to both F and F\ :sup:`2` refinement.

To keep the example self-contained, and also to demonstrate some
tools useful to develop and debug crystallographic algorithms,
instead of starting from real data, we will use randomly generated
crystal structures. Atoms are randomly spread in the unit cell and
the structure factors are then computed::

  from cctbx.array_family import flex
  from cctbx import xray
  from cctbx import crystal
  from cctbx import miller
  from cctbx.development import random_structure
  import random

  indices = miller.build_set(
          crystal_symmetry=crystal.symmetry(unit_cell=(10,11,12, 90,105,90),
                                            space_group_symbol="P21/c"),
          anomalous_flag=False,
          d_min=0.8)
  structure = random_structure.xray_structure(
    indices.space_group_info(),
    elements=['C']*6 + ['O']*2 + ['N'],
    volume_per_atom=18.6,
    random_u_iso=True)
  f_ideal = structure.structure_factors(d_min=indices.d_min()).f_calc()

Then we extract the amplitudes or intensities and we put them on a
different scale to make the example a bit more realistic::

  f_obs = f_ideal.amplitudes()
  f_obs.set_observation_type_xray_amplitude()
  f_obs *= 2
  f_obs_square = f_ideal.norm()
  f_obs_square.set_observation_type_xray_intensity()
  f_obs_square *= 3

In practice we would get f_obs or f_obs_square from a data file using
the ``iotbx`` and the observation type would have been automatically
set up for us.

Then we can construct the least-square targets and examine them::

  ls_against_f = xray.unified_least_squares_residual(f_obs)
  ls_against_f_square = xray.unified_least_squares_residual(f_obs_square)

  residuals = ls_against_f(f_ideal, compute_derivatives=True)
  print "against F: value=%.3f, scale=%.3f" % (residuals.target(),
                                               residuals.scale_factor())
  residuals = ls_against_f_square(f_ideal, compute_derivatives=True)
  print "against F^2: value=%.3f, scale=%.3f" % (residuals.target(),
                                                 residuals.scale_factor())

The constructor of the class automatically recognizes whether the
data passed to it are amplitudes or intensities and it adapts the
computations accordingly, correctly getting a target value of 0 and
a scale factor of 2 and 3, respectively, as expected in this trivial
example.

By passing ``compute_derivatives=True``, we require the computation
of the derivatives of the L.S. target function with respect to F\
:sub:`calc` (h) for each Miller index h. They are available as
``residuals.derivatives()``. Derivatives now brings us to discuss
refinement.

To demonstrate refinement, we will first add a perturbation to the
crystal structure as well as allowing the refinement of atom sites::

  perturbed_structure = structure.random_shift_sites(max_shift_cart=0.2)
  for s in perturbed_structure.scatterers():
          s.flags.set_grad_site(True)

The simplest refinement engine in the cctbx is
``cctbx.xray.minimization.lbfgs``. For a refinement against F\
:sup:`2`::

  refining_structure = perturbed_structure.deep_copy_scatterers()
  optimiser = xray.lbfgs(
                  target_functor=ls_against_f_square,
                  xray_structure=refining_structure,
                  structure_factor_algorithm="direct")
  print "Initial L.S. residual:%.3f" % optimiser.first_target_value
  structure.show_scatterers()
  print "Final L.S. residual:%.3f" % optimiser.final_target_value
  refining_structure.show_scatterers()

The LBFGS algorithm (Liu & Nocedal, 1989) used to minimize the
target function needs to compute its partial derivatives with
respect to the refined parameters (atomic position here). This
is where the derivatives which we showed to be obtainable by
``residuals.derivatives()`` above come into play: they are combined,
by using the chain rule, with the derivatives of F\ :sub:`calc`
(h) with respect to the refined parameters.

Least-squares weights
---------------------

By default, ``xray.unified_least_squares_residual`` tries to
make sensible choices for the least-squares weights, i.e. unit
weights for refinement against F and the so-called quasi-unit
weights for refinement against F\ :sup:`2`, which are 1/(4 F\
:sup:`2`\ :sub:`obs`). The latter choice is known to result in a
more stable refinement. However, almost universally, experimental
error estimates are available, as ``f_obs_square.sigmas()``, and
the least-square weights should take advantage of them. The pure
statistical weights 1/|sigma|\ :sup:`2` are rarely used in small
molecule crystallography. The most popular choice is that of the
ShelXL program (http://shelx.uni-ac.gwdg.de/), 1/(|sigma|\ (F\
:sup:`2`\ :sub:`obs`)\ :sup:`2` + (a P)\ :sup:`2` + b P) where P =
1/3 max(0, F\ :sup:`2`\ :sub:`obs`) + 2/3 F\ :sup:`2`\ :sub:`calc`,
which down-weights the stronger reflection while reducing statistical
bias by the use of P (Wilson 1976).

This weighting scheme is available in the module
``cctbx.xray.weighting_schemes``::

  weighting = xray.weighting_schemes.shelx_weighting()
  shelx_weighted_ls_against_f_square = xray.unified_least_squares_residual(
          f_obs_square, weighting=weighting)

This least-square target can then be used with the ``lbfgs``
minimizer. The default is that of ShelXL, i.e. a=0.1 and b=0, which
is best suited for early refinements when the structure is still
incomplete. All the other weighting schemes we mentioned are also
provided by that module (the reader is invited to read the comments
for each Python class).


Special position constraints
----------------------------

The traditional way to deal with the refinement of crystal structures
with atoms on special position is to constrain the latter never
to move away from the special position, in effect minimizing the
target function under a set of constraints. However, the minimizer
introduced above does not work in such a manner and neither does the
more sophisticated ``mmtbx.refinement.minimization.lbfgs``, mainly
because this is less important for macro-molecular models, which
usually have very few atoms on special positions. One can safely,
at each refinement cycle, let those atoms move away from the special
positions and move them back before the next cycle. In contrast,
in the small molecule world it is not uncommon that all atoms are on
special positions and the ad-hoc treatment is more problematic.

The cctbx does actually provide all the tools to find, store and
apply special position constraints. We will therefore start by
introducing those fundamental tools::

  crystal_symmetry = crystal.symmetry(
      unit_cell=(10,10,10,90,90,90),
      space_group_symbol="Pm3m")
  crystal_symmetry.show_summary()

  special_position_settings = crystal_symmetry.special_position_settings(
    min_distance_sym_equiv=0.5)

The site symmetry of a position given in fractional coordinates is
then obtained by::

  site_symmetry = special_position_settings.site_symmetry(
     site=(0.3, 0.31, 0.111))

The given position is at a Cartesian distance smaller than 0.5 A from a
special position with x=y. The ``site_symmetry`` algorithm computes
the exact location of the nearest special position (Grosse-Kunstleve
and Adams, 2002). The resulting ``site_symmetry.exact_site()``
is (0.305, 0.305, 0.111).

Then::

  site_constraints = site_symmetry.site_constraints()

gives access to the constraints on the exact site::

  >>> site_constraints.n_independent_params()
  2
  >>> site_constraints.independent_params(all_params=site_symmetry.exact_site())
  (0.305, 0.111)

The other way around, one can determine all the site coordinates from
the independent ones::

  >>> site_constraints.all_params(independent_params=(0.2, 0.1))
  (0.2, 0.2, 0.1)

The most interesting facility for refinement is the handling of
gradients. To illustrate it, let us introduce a simple function f
of the site coordinates x,y,z (during a refinement, f would instead
be the L.S. target for example, considering its dependence on the
coordinates of one atom only)::

  def f(x,y,z): return -x + 2*y + 3*z

which we then restrict onto the special position locus::

  def g(u,v):
    x, y, z = site_constraints.all_params((u,v))
    return f(x,y,z)

We wish to compute the derivatives of g with respect to u and v
knowing the derivatives of f with respect to x,y,z which are easily
read directly from its definition:
(df/dx, df/dy, df/dz) = (-1,2,3).
Those derivatives of g are then easily obtained with::

  >>> independent_gradients = site_constraints.independent_gradients(
        all_gradients=flex.double((-1,2,3)))
  (1.0, 3.0)

We have so far demonstrated only site constraints and only in
fractional coordinates, but the cctbx provides more facilities::

  frac_adp_constraints = site_symmetry.adp_constraints()

gives access to the constraints on ADPs in fractional coordinates whereas::

  cart_adp_constraints = site_symmetry.cartesian_adp_constraints(
    crystal_symmetry.unit_cell())

deals with the same constraints on ADPs in Cartesian coordinates.

It should be noted that the ``site_symmetry`` object caches all the
constraints it gives access to: the necessary symmetry computations
are performed only once when ``site_constraints``, ``adp_constraints``
or ``cartesian_adp_constraints`` are called for the first time and
the results are reused later on.

The cctbx special position constraints we have just expounded
are the basis of the refinement engine ``lbfgs`` in the module
``smtbx.refinement.minimization``. It is to be used exactly as
``cctbx.xray.minimization.lbfgs``, the only difference being
that atoms on special positions have their site coordinates and
ADPs constrained properly. Our plan is to develop
``smtbx.refinement.minimization.lbfgs`` over time into
the equivalent of the more sophisticated and versatile
``mmtbx.refinement.minimization.lbfgs``.


=================
Phil developments
=================

Phil (Python-based hierarchical interchange language) is a module
for the management of application parameters and, to some degree,
inputs. Phil was first introduced in Newsletter No. 5. In Newsletter
No. 7 we presented a complete example application which uses Phil
for the handling of program parameters.

Overall, Phil has been very stable after an intense development push
almost three years ago. Some features were added in response to needs
that arose as part of the development of other applications, and a
few bugs were fixed. These changes are very conservative. However,
recently there has also been an important semantic change
affecting "multiple definitions" and "multiple scopes". The new
features and changes are reflected in the updated documentation (URL
at end of this section) which is based on parts of our original article
in Newsletter No. 5. Important new features and changes include:

  - The handling of multiple scopes went through a few iterations
    eliminating bugs, inconsistencies and inefficiencies. The latest
    implementation (as of October 2007) is considered mature, stable
    and intuitive.

  - The new ``.fetch_diff()`` method returns only objects with
    non-default values. This is most useful for applications
    with a large number of parameters. Usually most parameters
    are not changed by the user. ``.fetch_diff()`` pin-points
    the (few) changes.

  - The previous plain ``include`` syntax was changed to ``include file``;
    the new ``include scope`` syntax was added. The ``include scope``
    feature builds on Python's standard import mechanism. In practice
    it is found to be much more useful than the ``include file``
    feature.

  - The syntax-aware comment feature now uses ``!`` as the comment
    character. For example::

      !crystal_symmetry {
        unit_cell = None
        space_group = None
      }

    The exclamation mark comments out the entire ``crystal_symmetry``
    scope including all embedded definitions. ``#crystal_symmetry``
    is now a normal one-line comment as expected by most users.
    Based on feedback, the distinction between ``#crystal_symmetry``
    (syntax-aware comment) and ``# crystal_symmetry`` (one-line comment)
    was found to be too subtle.

  - Again based on feedback, the ``{`` and ``}`` scope-delineation
    characters (compare with the ``crystal_symmetry`` example above)
    are now interpreted as one-character keywords. In contrast to the
    initial implementation, any ``{`` and ``}`` in strings have to be
    quoted. This leads to a more obvious syntax. It also enables a
    scope name and embedded definitions to appear on the same line.
    This can be convenient in certain situations, in particular for
    definitions specified on the command-line.

  - Similarly, ``;`` characters are now interpreted as one-character
    keywords (they had no syntactical meaning in the original
    implementation). This enables ``scope { a=1; b=2 }`` all on
    one line.

  - Related to the ``{``, ``}`` scope-delineation change, the
    ``${varname}`` variable substitution syntax was changed to
    ``$(varname)``.

For more details see the updated documentation:
http://cctbx.sourceforge.net/libtbx_phil.html

==========
Subversion
==========

For six years, from April 2001, when the SourceForge cctbx project was
started, to March 2007, cctbx development was based on the Concurrent
Versions System (CVS, http://www.cvshome.org/). This era came to
an end with the transition to the more modern Subversion system
(SVN, http://subversion.tigris.org/). In the last two years, the
vast majority of the open source community has gone through this
transition. SVN introduces many new features, most importantly
"atomic commits" which handle a set of changes as one entity, no
matter how many files and directories are involved. This makes it
much easier to keep track of large-scale changes, and to backtrack
if necessary.  Almost as important is the option to rename files
and directories without introducing breaks in the development
history. The need to reorganize arises frequently in the early stages
of a (sub-)project. Often it is initially not very clear how all the
pieces fit together. The best ideas for organizing the whole tend to
crystallize only after a critical subset of the project is already
implemented. SVN provides a clear path for reorganizing the sources.

The cctbx sources were converted using the ``cvs2svn`` script
(http://cvs2svn.tigris.org/, it is a Python script!) which preserves
the complete CVS development history, including all original time
stamps, user names, and log messages. The last state of the cctbx CVS
repository before the conversion is still available for viewing at
SourceForge, but for all practical purposes it is completely obsolete.

The main cctbx page at Sourceforge (http://cctbx.sourceforge.net/)
includes a link to the web view of the cctbx SVN
repository. Instructions for checking out the sources using the
svn command are posted under the cctbx installation instructions
(http://cctbx.sourceforge.net/current/installation.html). These
instructions show how to get a selection of individual cctbx
modules. Note that it is also possible to get the entire cctbx project
with a single command (this wasn't possible with CVS), for example::

  svn co https://cctbx.svn.sourceforge.net/svnroot/cctbx/trunk svnroot_cctbx

The name of the target directory (``svnroot_cctbx``) is arbitrary.
The advantage of checking out everything under one tree is that a
single svn update or svn commit command works on the entire repository.
The disadvantage compared to the more granular approach is that the
working copy includes modules that may not be needed and therefore
consumes more disk space (currently about 62 MB).

With the move to SVN, the old distinction between `anonymous` and
`developer` checkouts disappeared. This is a great advantage.
Everybody can checkout working copies using the exact same
commands. Credentials, i.e. a user name and password, are required
only for write access (e.g. svn commit). It is possible to checkout
a working copy before having a SourceForge user name, make local
modifications, get a user name and request cctbx svn write access,
and then to commit the changes from the initial working copy.

=================
 Acknowledgments
=================

L.J. Bourhis gratefully acknowledge the financial support of EPSRC
under grant number EP/C536274/1. R.W. Grosse-Kunstleve and P.D. Adams
gratefully acknowledge the financial support of NIH/NIGMS under
grant number P01GM063210. Their work was supported in part by the US
Department of Energy under Contract No. DE-AC02-05CH11231.

============
 References
============

Grosse-Kunstleve, R.W., Adams, P.D. (2002). Acta Cryst. A58, 60-65.

Grosse-Kunstleve, R.W., Adams, P.D. (2003).
Newsletter of the IUCr Commission on Crystallographic Computing, 1, 28-38.

Grosse-Kunstleve, R.W., Afonine, P.V., Sauter, N.K., Adams, P.D. (2005).
Newsletter of the IUCr Commission on Crystallographic Computing, 5, 69-91.

Grosse-Kunstleve, R.W., Zwart P.H., Afonine, P.V., Ioerger, T.R.,
Adams, P.D. (2006).
Newsletter of the IUCr Commission on Crystallographic Computing, 7, 92-105.

Liu C., Nocedal J. (1989). Mathematical Programming 45, 503-528.

Wilson, A.J.C. (1976). Acta Cryst. A32, 994-996.
