Flexible Docking Tutorial, Part I
The tutorial introduces the basic ideas of the landmark-based flexible docking strategy using actin as a test system. It is helpful for the understanding of the tutorial if the user is already familiar with the classic EM tutorial. In addition to Situs, parts of this tutorial also rely on the program X-PLOR (version 3.1 or later). X-PLOR used to be available for free at the Brunger web site, if you have trouble obtaining the program please contact us (situs >>at<< biomachina.org). The results can be compared to solutions distributed with the tutorial software. More documentation is available in the user guide, on the Methodology page, and in the published articles.
Content:
Download and Installation

Follow first the simple registration and download steps .

In addition to the executables, the Situs_2.3_flex_tutorial/bin directory contains 11 data files:

  • 0_actin.str: X-PLOR stream file for setting up parameters and topology.
  • 0_actin_orig.pdb: Original actin (start) structure.
  • 0_actin_target.pdb: Target structure for flexible docking.
  • 0_flex_target.inp: X-PLOR script for flexible docking using four codebook vectors.
  • 0_toph19x.pro: Standard X-PLOR topology file for proteins.
  • 0_toph19.pep: Standard X-PLOR peptide link.
  • 0_param19.pro: Standard X-PLOR parameter file for proteins.

  • 0_flex_rnap.inp: X-PLOR script for skeleton-based flexible docking using 15 codebook vectors.
  • 0_rnap1.pdb: atomic structure of RNA polymerase in "closed" conformation.
  • 0_rnap2.situs: simulated EM map of RNA polymerase in "open" conformation.
  • 0_rnap.psf: X_PLOR psf file for setting up parameters and topology.

In the following, we will use the first 7 files to dock actin to various low-resolution maps. The user can compare all generated files to the files in the "solutions" directory. The other 4 files (brown) are used in part II of this tutorial.

Data Flow and Design

The series of steps and the programs that are required to dock an atomic-resolution structure flexibly to single-molecule, low-resolution data are shown schematically in the following figure. Detailed program explanations are given in the user guide .

Schematic diagram of flexing related routines. Major Situs components (blue) are classified by their functionality. The main data flow is indicated by brown arrows. The modeling of distance constraints for the motion capture skeleton is shown in dark blue. The external programs are shown in orange. Visualization (orange) for the rendering of the data requires a molecular graphics viewer (we recommend the free VMD graphics program, version 1.8.4 or higher; Chimera and Sculptor also support Situs format). The MD refinement is supported by X-PLOR.

Standard EM formats are supported and are converted to cubic lattices in Situs format. Subsequently, the data is inspected and, if necessary, prepared for the vector quantization using a variety of visualization and analysis tools. Atomic coordinates in PDB format can be transformed to low-resolution maps, if necessary, and vice versa. During vector quantization of the high-resolution structure, distances can be learnt that are sent to the vector quantizer of the low-resolution structure to enable skeleton-based fitting. After the vector quantization, the high-resolution structure is flexibly docked to the low-resolution density by the corresponding codebook vectors.

Creating a Simulated Target Map for Validation

First we create a simulated EM map from a known atomic structure for later validation. We lower the resolution of the target structure to 15 Å with the pdb2vol kernel convolution utility. Enter at the shell prompt:

./pdb2vol 0_actin_target.pdb 1_actin_target.situs

Select mass-weighting (enter 2) and select no B-factor cutoff (enter 1). Next enter the desired voxel spacing of the output map. Given the dimensions of the structure, 2 Å appears to be a good compromise between lattice accuracy and storage requirement. Next, enter the desired output resolution as a negative number: -15 Å. Next, select the Gaussian smoothing kernel (enter 1). Select lattice correction (enter 1), and enter the maximum amplitude of the kernel (enter 1).

The program projects the atomic structure to the lattice, computes the Gaussian kernel, and carries out the real-space convolution, writing the resulting volumetric map to the file 1_actin_target.situs.

Here is the full program output:

% ./pdb2vol 0_actin_target.pdb 1_actin_target.situs
lib_pio> 3572 atoms read.
pdb2vol> Found 639 hydrogens, 0 water atoms, 0 codebook vectors, 0 density atoms
pdb2vol> Hydrogens will be ignored.
pdb2vol> Do you want to mass-weight the atoms ?
pdb2vol>
pdb2vol> 1: No
pdb2vol> 2: Yes
pdb2vol> 2
pdb2vol> Do you want to select atoms based on a B-factor threshold?
pdb2vol>
pdb2vol> 1: No
pdb2vol> 2: Yes
pdb2vol> 1
pdb2vol> 2933 out of 3572 atoms selected for conversion.
pdb2vol>
pdb2vol> The input structure measures 74.971 x 62.460 x 38.249 Angstrom
pdb2vol>
pdb2vol> Please enter the desired voxel spacing for the output map (in Angstrom): 2
pdb2vol>
pdb2vol> Kernel width. Please enter (in Angstrom):
pdb2vol> (as pos. value) kernel half-max radius or
pdb2vol> (as neg. value) target resolution (2 sigma)
pdb2vol> Now enter (signed) value: -15
pdb2vol>
pdb2vol> Please select the type of smoothing kernel:
pdb2vol>
pdb2vol> 1: Gaussian, exp(-1.5 r^2 / sigma^2)
pdb2vol> sigma = 7.500A, r-half = 5.098A, r-cut = 12.990A
pdb2vol>
pdb2vol> 2: Triangular, max(0, 1 - 0.5 |r| / r-half)
pdb2vol> sigma = 7.500A, r-half = 5.929A, r-cut = 11.859A
pdb2vol>
pdb2vol> 3: Semi-Epanechnikov, max(0, 1 - 0.5 |r|^1.5 / r-half^1.5)
pdb2vol> sigma = 7.500A, r-half = 7.331A, r-cut = 11.637A
pdb2vol>
pdb2vol> 4: Epanechnikov, max(0, 1 - 0.5 r^2 / r-half^2)
pdb2vol> sigma = 7.500A, r-half = 8.101A, r-cut = 11.456A
pdb2vol>
pdb2vol> 5: Hard Sphere, max(0, 1 - 0.5 r^60 / r-half^60)
pdb2vol> sigma = 7.500A, r-half = 9.722A, r-cut = 9.835A
pdb2vol> 1
pdb2vol>
pdb2vol> Do you want to correct for lattice interpolation smoothing effects?
pdb2vol>
pdb2vol> 1: Yes (slightly lowers the kernel width to maintain target resolution)
pdb2vol> 2: No
pdb2vol> 1
pdb2vol>
pdb2vol> Finally, please enter the desired kernel amplitude (scaling factor): 1
pdb2vol>
pdb2vol> Projecting atoms to cubic lattice by trilinear interpolation...
pdb2vol> ... done. Lattice smoothing (sigma = atom rmsd): 1.408 Angstrom
pdb2vol>
pdb2vol> Computing Gaussian kernel (correcting sigma for lattice smoothing)...
pdb2vol> ... done. Kernel map extent 15 x 15 x 15 voxels
pdb2vol>
pdb2vol> Convolving lattice with kernel...
pdb2vol> ... done. Spatial resolution (2 sigma) of output map: 15.000A
pdb2vol>
lib_vio> Writing density data...
lib_vio> Volumetric data written to file 1_actin_target.situs
lib_vio> File 1_actin_target.situs - Header information:
lib_vio> Columns, rows, and sections: x=1-57, y=1-52, z=1-39
lib_vio> 3D coordinates of first voxel (1,1,1): (-58.000000,-50.000000,-36.000000)
lib_vio> Voxel size in Angstrom: 2.000000


By loading the resulting map into VMD (see below), one can vary the density threshold. A threshold of 10 (~15% of the maximum value) corresponds approximately to the surface of the molecule.
Preliminary Rigid-Body Registration

Before we start fitting the original actin structure to the target structure, it is mandatory to roughly align the atomic structure and the target map by rigid-body fitting.

An initial alignment "by eye" can e.g. be done with VMD (move a loaded molecule by selecting the VMD menu Mouse -> Move -> Molecule, then translate it with the mouse and rotate it by pressing the Shift key; the new coordinates can then be saved by selecting File -> Save Coordinates).

Alternatively, an automated rigid-body fitting procedure can also be employed. E.g. the qrange tool has already been explained in the classic EM tutorial. However, if the conformational changes are significant, the best-scoring fit returned by rigid body fitting may not be the one we are interested in. It is therefore a good idea to export a number of best-scoring fits, and to explore the alignment of these fits by eye, before selecting one for subsequent flexible fitting.

For example, using qrange with a low number k=3 or k=4 vectors ensures a coarse orientational sampling. At the shell prompt enter:

./qrange 1_actin_target.situs 0_actin_orig.pdb

At the program prompt, enter the density cutoff value of 10 (surface threshold). Next, we enter our choice of coordinate system origin convention (no change). Finally, we enter the B-factor cutoff (atoms with high B-factors will be ignored). Enter 200 to select all atoms. After the vector quantization and docking, we select k=4 vectors, select the first fit (enter 1) and enter the output file name: 2_actin_orig_dock_target.pdb:


% ./qrange 1_actin_target.situs 0_actin_orig.pdb
lib_vio> File 1_actin_target.situs - Header information:
lib_vio> Columns, rows, and sections: x=1-57, y=1-52, z=1-39
lib_vio> 3D coordinates of first voxel (1,1,1): (-58.000000,-50.000000,-36.000000)
lib_vio> Voxel size in Angstrom: 2.000000
lib_vio> Reading density data...
lib_vio> Volumetric data read from file 1_actin_target.situs
lib_pio> 3572 atoms read.
lib_pio> 3572 atoms read.
.qrange>
.qrange> Map density values below a user-defined cutoff value will not be considered
.qrange> Do you want to inspect the input density values before entering the cutoff value?
.qrange> Choose one of the following three options -
.qrange> 1: No (continue)
.qrange> 2: Show me the minimum and maximum density values only
.qrange> 3: Show me the voxel histogram
.qrange> 1
.qrange> Now enter the cutoff density value: 10
.qrange> Cutting off density values < 10.000000, remaining occupied volume: 21141 voxels (1.691280e+05 Angstrom^3)
.qrange>
.qrange> Shift the origin of the map coordinate system? (See user guide)
.qrange>
.qrange> 1: No. Keep current origin.
.qrange> (recommended e.g. if volcube will be used)
.qrange> 2: Yes. Set origin to the centroid of the current map.
.qrange> 3: Yes. Set origin to the centroid of another, related map.
.qrange> 4: Yes. Set origin to a voxel in the current map.
.qrange> 5: Yes. Set origin to an arbitrary vector.
.qrange> 1
.qrange> Range of crystallographic B-factors: 0.57 - 99.90.
.qrange> Enter B-factor cutoff (only atoms below this value will be included): 200
.qrange> There are 2933 non-hydrogen atoms, represented by 2954 equally weighted input vectors
.qrange> Sphericity of the atomic structure: 0.52
.qrange>
.qrange> Docking will be carried out for a range 3 - 9 pairs of codebook vectors
.qrange>
.qrange> Now vector quantizing atomic and density data: 3 codebook vectors each
.qrange> Now docking by 3! = 6.000E+00 possible pairs of corresponding vectors
.qrange> Now vector quantizing atomic and density data: 4 codebook vectors each
.qrange> Now docking by 4! = 2.400E+01 possible pairs of corresponding vectors
.qrange> Now vector quantizing atomic and density data: 5 codebook vectors each
.qrange> Now docking by 5! = 1.200E+02 possible pairs of corresponding vectors
.qrange> Now vector quantizing atomic and density data: 6 codebook vectors each
.qrange> Now docking by 6! = 7.200E+02 possible pairs of corresponding vectors
.qrange> Now vector quantizing atomic and density data: 7 codebook vectors each
.qrange> Now docking by 7! = 5.040E+03 possible pairs of corresponding vectors
.qrange> Now vector quantizing atomic and density data: 8 codebook vectors each
.qrange> Now docking by 8! = 4.032E+04 possible pairs of corresponding vectors
.qrange> Now vector quantizing atomic and density data: 9 codebook vectors each
.qrange> Now docking by 9! = 3.629E+05 possible pairs of corresponding vectors
.qrange>
.qrange> Choose optimum number k of vectors:
.qrange> Printing selection criteria as function of k (in Angstrom units)
.qrange>
.qrange> stat. vector variability -- vector rmsd of highest scoring fit
k = 3 0.896 3.866
k = 4 0.644 4.594
k = 5 3.063 7.631
k = 6 1.940 5.675
k = 7 0.696 4.208
k = 8 1.664 5.281
k = 9 1.194 6.250
.qrange>
.qrange> Select number k for docking (range: 3-9; 0 to exit): 4
.qrange> Printing 20 best vector least-squares fits
.qrange>
.qrange> rmsd (Angstrom) -- correlation coefficient -- permutation
1. 4.594 0.665 (4,1,2,3)
2. 4.908 0.589 (2,3,4,1)
3. 5.597 0.565 (3,2,1,4)
4. 5.736 0.648 (1,4,3,2)
5. 6.079 0.578 (2,1,4,3)
6. 6.843 0.605 (3,4,1,2)
7. 7.000 0.563 (4,3,2,1)
8. 7.655 0.647 (1,2,3,4)
9. 15.628 0.609 (1,4,2,3)
10. 15.660 0.521 (3,2,4,1)
11. 15.824 0.401 (1,2,4,3)
12. 16.060 0.517 (3,4,2,1)
13. 16.271 0.462 (2,3,1,4)
14. 16.353 0.422 (4,1,3,2)
15. 16.541 0.500 (4,3,1,2)
16. 16.762 0.446 (2,1,3,4)
17. 16.934 0.459 (2,4,1,3)
18. 17.082 0.571 (3,1,4,2)
19. 18.173 0.477 (4,2,3,1)
20. 18.200 0.417 (1,3,2,4)
.qrange> Permutations indicate the order of high res. vectors fitted to low res. vectors
.qrange> Writing fitted PDB file(s) to output
.qrange> Select one least-squares fit for output (range: 1-20; 0 to return to prev. menu): 1
.qrange> Enter output filename: 2_actin_orig_dock_target.pdb
.qrange> Rigid-body fitted coordinates (selection 1) and the codebook vectors have been written to file 2_actin_orig_dock_target.pdb
.qrange> Statistical accuracy of fitting: +/- 0.644 Angstrom, +/- 3.113 degrees


After this run we purge the resulting file of the codebook vectors:

cat 2_actin_orig_dock_target.pdb | grep -v "QVOL" | grep -v "QPDB" > tmp
mv tmp 2_actin_orig_dock_target.pdb

Vector Quantization of the High-Resolution Structure with qpdb

Now, we perform the vector quantization of the roughly fitted structure with the qpdb utility.

At the shell prompt, enter

./qpdb 2_actin_orig_dock_target.pdb 2_actin_orig_dock_target.qpdb

and select mass-weighting (enter 2), ignore the B-factor cutoff (enter 1). Next, enter the number of codebook vectors: 4 (one for each of actin's subdomains). Watch the program compute a number of datasets (default: 8) for statistical averaging. The file 2_actin_orig_dock_target.qpdb now contains the four new codebook vectors, their rms variability, and the effective radius of their Voronoi cells, in PDB format. Finally, the user is asked whether nearest-neighbor connectivities should be learnt, or whether the Voronoi cells should be saved. Here we enter 1 (don't save the connectivities), but we want to save the Voronoi cells for flexible docking (enter 2). Enter the Voronoi cell filename: 2_actin_orig_dock_target.cell. This PDB-formatted file will now contain indices in each atom B-factor field denoting its membership to one of the four Voronoi cells.

Here is the output of the entire qpdb calculation:

% ./qpdb 2_actin_orig_dock_target.pdb 2_actin_orig_dock_target.qpdb
lib_pio> 3580 atoms read.
...qpdb> Found 639 hydrogens, 0 water atoms, 8 codebook vectors, 0 density atoms
...qpdb> Hydrogens will be ignored.
...qpdb> Do you want to mass-weight the atoms ?
...qpdb>
...qpdb> 1: No
...qpdb> 2: Yes
...qpdb> 2
...qpdb> Do you want to select atoms based on a B-factor threshold?
...qpdb>
...qpdb> 1: No
...qpdb> 2: Yes
...qpdb> 1
...qpdb> 2954 equally weighted inputs out of originally 3580 atoms selected for conversion.
...qpdb>
...qpdb> Sphericity of the atomic structure: 0.52
...qpdb> Enter desired number of codebook vectors for data quantization: (0 to exit): 4
...qpdb> Computing 8 datasets, 100000 iterations each...
...qpdb> Now producing dataset 1
...qpdb> Now producing dataset 2
...qpdb> Now producing dataset 3
...qpdb> Now producing dataset 4
...qpdb> Now producing dataset 5
...qpdb> Now producing dataset 6
...qpdb> Now producing dataset 7
...qpdb> Now producing dataset 8
...qpdb>
...qpdb> Codebook vectors have been written to file 2_actin_orig_dock_target.qpdb
...qpdb> The PDB B-factor field contains the equivalent spherical radii
...qpdb> of the corresponding Voronoi cells (in Angstrom).
...qpdb> Cluster analysis of the 8 independent calculations:
...qpdb> The PDB occupancy field in 2_actin_orig_dock_target.qpdb contains the rms variabilities of the vectors.
...qpdb> Average rms fluctuation of the 4 codebook vectors: 0.865 Angstrom
...qpdb> Radius of gyration of the 4 codebook vectors: 17.347 Angstrom
...qpdb>
...qpdb> Do you want to learn nearest-neighbor connectivities?
...qpdb> Choose one of the following options -
...qpdb> 1: No.
...qpdb> 2: Learn and save to a PSF file
...qpdb> 3: Learn and save to a constraints file
...qpdb> 4: Learn and save to both PSF and constraints files
...qpdb> 1
...qpdb>
...qpdb> Do you want to save the Voronoi cells?
...qpdb> Choose one of the following options -
...qpdb> 1: No. I'm done
...qpdb> 2: Yes. Save cells to a PDB file
...qpdb> 2
...qpdb> Enter Voronoi cell PDB filename: 2_actin_orig_dock_target.cell
...qpdb> Voronoi cell numbers written to B-factor field in file 2_actin_orig_dock_target.cell.


Vector Quantization of the Low-Resolution Map with qvol

The vector quantization of a volumetric datasets with the qvol utility involves only a few simple steps. After entering

./qvol 1_actin_target.situs 3_actin_target.qvol

the user is prompted to enter the density cutoff value. Again, we enter 10. Subsequently, we enter the number of codebook vectors, which must be compatible with the number for the high-resolution dataset (4). Finally, the user is asked whether nearest-neighbor connectivities should be learnt. This functionality is important later, but not now, so we enter no. The file 3_actin_target.qvol now contains the "QVOL" codebook vectors.

Here is the output of this qvol calculation:

% ./qvol 1_actin_target.situs 3_actin_target.qvol
lib_vio> File 1_actin_target.situs - Header information:
lib_vio> Columns, rows, and sections: x=1-57, y=1-52, z=1-39
lib_vio> 3D coordinates of first voxel (1,1,1): (-58.000000,-50.000000,-36.000000)
lib_vio> Voxel size in Angstrom: 2.000000
lib_vio> Reading density data...
lib_vio> Volumetric data read from file 1_actin_target.situs
...qvol> Density values below a user-defined cutoff value will not be considered
...qvol> Do you want to inspect the input density values before entering the cutoff value?
...qvol> Choose one of the following three options -
...qvol> 1: No (continue)
...qvol> 2: Show me the minimum and maximum density values only
...qvol> 3: Show me the voxel histogram
...qvol> 1
...qvol> Now enter the cutoff density value: 10
...qvol> Cutting off density values < 10.000000, remaining occupied volume: 21141 voxels (1.691280e+05 Angstrom^3)
...qvol> Enter desired number of codebook vectors: 4
...qvol>
...qvol> Using random start vectors.
...qvol> Computing 8 datasets, 100000 iterations each...
...qvol> Now producing dataset 1
...qvol> Now producing dataset 2
...qvol> Now producing dataset 3
...qvol> Now producing dataset 4
...qvol> Now producing dataset 5
...qvol> Now producing dataset 6
...qvol> Now producing dataset 7
...qvol> Now producing dataset 8
...qvol> Final clustering -- Average vector update: 4.232600e-01 Angstrom
...qvol> Final clustering -- Average vector update: 0.000000e+00 Angstrom
...qvol>
...qvol> Codebook vectors have been written to file 3_actin_target.qvol
...qvol> The PDB B-factor field contains the equivalent spherical radii
...qvol> of the corresponding Voronoi cells (in Angstrom).
...qvol> Cluster analysis of the 8 independent runs:
...qvol> The PDB occupancy field in 3_actin_target.qvol contains the rms variabilities of the vectors.
...qvol> Average rms fluctuation of the 4 codebook vectors: 0.373 Angstrom
...qvol> Radius of gyration of the 4 codebook vectors: 20.068 Angstrom
...qvol>
...qvol> Do you want to update or save the input connectivities?
...qvol> Choose one of the following options -
...qvol> 1: No. I'm done
...qvol> 2: Update and save to a PSF file
...qvol> 3: Update and save to a constraints file
...qvol> 4: Update and save to both PSF and constraints files
...qvol> 1
...qvol> Bye bye!

Flexible Docking with Molecular Dynamics

The most time-consuming part in setting up a simulation is the preparation of the input PDB and script files to conform with the topology and parameters provided by standard packages such as X-PLOR. This preparation usually requires considerable user experience, patience and a number of short test simulations. The necessary steps have already been implemented for actin. For details users should consult the X-PLOR documentation.

All stereochemical information and the system topology and parameters are defined in the stream file 0_actin.str in X-PLOR scripting language. This file will be sourced from all subsequent X-PLOR flexing scripts. It lists the required standard topology and parameter files (system dependent), and the amino acide sequence. The peptide links are formed here with the toph19.pep macro. Three standard atoms are renamed to conform with the input PDB files.

The flexible docking is enforced during an energy minimization with X-PLOR using NOE distance restraints that superimpose the centroids of the high-resolution Voronoi cells with the corresponding low-resolution vectors. Matching pairs of codebook vectors can be used as point landmarks for the flexible registration of the atomic structure with the target density map. The script file 0_flex_target.inp contains all the necessary commands. This file must be edited to assign pairs of corresponding vectors (the vector order generated by qvol and qpdb above is random). It is best to inspect the codebook vector files 3_actin_target.qvol and 2_actin_orig_dock_target.qpdb with a graphics program to assign the matching vectors. The order has already been assigned in 0_flex_target.inp, but the earlier calculations are machine-dependent, so a user will need to verify or adjust the assignment.

To start the X-PLOR refinement, enter at the shell prompt

xplor < 0_flex_target.inp > 4_flexed_to_target.log

The X-PLOR script 0_flex_target.inp has several parts that are documented in the code. After reading the stream file 0_actin.str, topology and parameter settings are introduced for a dummy (probe) atom and for the codebook vectors (should not be altered by user). A segment statement creates the specified number of QVOL codebook vectors (adjust if necessary), and the Voronoi cells and atomic coordinates are read from a file. Subsequently, the QVOL vector coordinates are read. Finally, the NOE restraints are set, the QVOL vectors are matched with the Voronoi cells, and the system is energy minimized.

Visualization

We now inspect the above results with the free VMD graphics program (version 1.8.4 or higher). The following sequence of commands in the VMD text console (cf. VMD user guide ) will load the original and flexibly docked actin structures, 2_actin_orig_dock_target.pdb (red) and 4_flexed_to_target.pdb (green), and render them in colored tube representation. The script also renders the target density map, 1_actin_target.situs, in gray:

mol load pdb 4_flexed_to_target.pdb
mol load pdb 2_actin_orig_dock_target.pdb
mol load situs 1_actin_target.situs
mol top 0
rotate stop
display resetview
display projection orthographic
mol modstyle 0 0 Tube 0.3 6
mol modstyle 0 1 Tube 0.3 6
mol modstyle 0 2 Isosurface 10 0 0 1 2 1
mol modcolor 0 0 ColorID 7
mol modcolor 0 1 ColorID 1
mol modcolor 0 2 ColorID 2

The result should look like this:


(Click image to enlarge)

Part II: Skeleton-Based Docking

We are now prepared to improve the stereochemical quality of the flexing. It is possible to constrain the distances between the landmarks to reduce the effect of noise and experimental limitations on the codebook vector positions. This "skeleton" based approach, as described on the Vector Quantization page, is related to 3D motion capture technology used in the entertainment industry and in biomechanics. The application is demonstrated in the Flexible Docking Tutorial, Part II.

Return to the front page .