Classic EM Tutorial, Part II
The programs qpdb and qvol are specialized tools that perform vector quantization to generate coarse-grained point cloud models of the structural data. The matchpoint tool (new in Situs 2.5) supports the registration of point clouds even in cases where they are of unequal size. The usage of these routines is demonstrated in the following, using the examples of part I. As a prerequisite at least the installation step of part I must be followed. The rest of part I can be skipped if the input files are copied from the "solutions" directory. More documentation is available in the user guide, on the methodology page, and in the published articles.
Content:

Vector Quantization of High-Resolution Structures with qpdb

First, we perform the vector quantization of a high-resolution data with the qpdb utility.

At the shell prompt, enter

./qpdb 0_ncd.pdb 1_ncd_n7.pdb

and at the program prompt, select to exclude the water molecules, perform mass-weighting, and select the B-factor cutoff (atoms with high B-factors will be ignored). Enter 300 to select all atoms. Next, enter the number of codebook vectors: 7. Watch the program compute a number of datasets (default: 8) for statistical averaging. The file 1_ncd_n7.pdb now contains the seven codebook vectors, their rms variability, and the effective radius of their Voronoi cells, in PDB format. Also, the program returns the radius of gyration of the vectors.

Finally, the user is asked whether nearest-neighbor connectivities should be learnt, or whether the Voronoi cells should be saved. This functionality is important only for flexible docking, so we enter no.

Here is the output of the entire qpdb calculation:

% ./qpdb 0_ncd.pdb 1_ncd_n7.pdb
lib_pio> 2632 atoms read.
...qpdb> Found 2 hydrogens, 3 water atoms, 0 codebook vectors, 0 density atoms
...qpdb> Hydrogens will be ignored.
...qpdb> 3 water atoms found in file 0_ncd.pdb
...qpdb> Mass-weighting on.
...qpdb> Do you want to exclude the water atoms?
...qpdb>
...qpdb> 1: No
...qpdb> 2: Yes
...qpdb> 2
...qpdb> Water atoms will be ignored.
...qpdb> Do you want to select atoms based on a B-factor threshold?
...qpdb>
...qpdb> 1: No
...qpdb> 2: Yes
...qpdb> 2
...qpdb> Range of crystallographic B-factors: 2.00 - 100.00.
...qpdb> Enter B-factor cutoff (only atoms below this value will be included): 300
...qpdb> 3307 equally weighted inputs out of originally 2632 atoms selected for conversion.
...qpdb>
...qpdb> Sphericity of the atomic structure: 0.68
...qpdb> Enter desired number of codebook vectors for data quantization: (0 to exit): 7
...qpdb> Computing 8 datasets, 100000 iterations each...
...qpdb> Now producing dataset 1
...qpdb> Now producing dataset 2
...qpdb> Now producing dataset 3
...qpdb> Now producing dataset 4
...qpdb> Now producing dataset 5
...qpdb> Now producing dataset 6
...qpdb> Now producing dataset 7
...qpdb> Now producing dataset 8
...qpdb>
...qpdb> Codebook vectors have been written to file 1_ncd_n7.pdb
...qpdb> The PDB B-factor field contains the equivalent spherical radii
...qpdb> of the corresponding Voronoi cells (in Angstrom).
...qpdb> Cluster analysis of the 8 independent calculations:
...qpdb> The PDB occupancy field in 1_ncd_n7.pdb contains the rms variabilities of the vectors.
...qpdb> Average rms fluctuation of the 7 codebook vectors: 1.022 Angstrom
...qpdb> Radius of gyration of the 7 codebook vectors: 17.209 Angstrom
...qpdb>
...qpdb> Do you want to learn nearest-neighbor connectivities?
...qpdb> Choose one of the following options -
...qpdb> 1: No.
...qpdb> 2: Learn and save to a PSF file
...qpdb> 3: Learn and save to a constraints file
...qpdb> 4: Learn and save to both PSF and constraints files
...qpdb> 1
...qpdb>
...qpdb> Do you want to save the Voronoi cells?
...qpdb> Choose one of the following options -
...qpdb> 1: No. I'm done
...qpdb> 2: Yes. Save cells to a PDB file
...qpdb> 1
...qpdb> Bye bye!

Vector Quantization of Low-Resolution Maps with qvol

The vector quantization of a volumetric datasets with the qvol utility involves only a few more steps. After entering


./qvol 4_s1.situs 4_s1_n7.pdb

the user has the choice of inspecting the density distribution before entering a cutoff density value. Only density values that exceed the cutoff are considered in the vector quantization. Here, we enter the cutoff value 20 for file 4_s1.situs (see part I of this tutorial). Subsequently, the user is prompted to enter the number of codebook vectors. We choose again 7 since in this example both the PDB structure and the target map are of equal size, but in principle, the matchpoint program in the next step below is also able to handle unequal numbers of codebook vectors.

Investigate now how the compactness of the codebook vectors, defined by their radius of gyration, depends on the cutoff value chosen in the quantization. You should verify that higher cutoff densities yield more compact vectors. Note that cutoffs lower than 20 do not make sense since 4_s1.situs was created with floodfill using this value. The optimum cutoff for qvol is the one where the radius of gyration is equal to that computed with qpdb for the vectors encoding the high-resolution structure.

Finally, the user is asked whether nearest-neighbor connectivities should be learnt. This functionality is only important for flexible docking, so we enter no.

Here is the output of a typical qvol calculation:

% ./qvol 4_s1.situs 4_s1_n7.pdb
lib_vio> File 4_s1.situs - Header information:
lib_vio> Columns, rows, and sections: x=1-9, y=1-10, z=1-14
lib_vio> 3D coordinates of first voxel (1,1,1): (396.000000,252.000000,132.000000)
lib_vio> Voxel size in Angstrom: 6.000000
lib_vio> Reading density data...
lib_vio> Volumetric data read from file 4_s1.situs
...qvol> Density values below a user-defined cutoff value will not be considered
...qvol> Do you want to inspect the input density values before entering the cutoff value?
...qvol> Choose one of the following three options -
...qvol> 1: No (continue)
...qvol> 2: Show me the minimum and maximum density values only
...qvol> 3: Show me the voxel histogram
...qvol> 1
...qvol> Now enter the cutoff density value: 20
...qvol> Cutting off density values < 20.000000, remaining occupied volume: 284 voxels (6.134400e+04 Angstrom^3)
...qvol> Enter desired number of codebook vectors: 7
...qvol>
...qvol> Using random start vectors.
...qvol> Computing 8 datasets, 100000 iterations each...
...qvol> Now producing dataset 1
...qvol> Now producing dataset 2
...qvol> Now producing dataset 3
...qvol> Now producing dataset 4
...qvol> Now producing dataset 5
...qvol> Now producing dataset 6
...qvol> Now producing dataset 7
...qvol> Now producing dataset 8
...qvol> Final clustering -- Average vector update: 6.754332e-01 Angstrom
...qvol> Final clustering -- Average vector update: 0.000000e+00 Angstrom
...qvol>
...qvol> Codebook vectors have been written to file 4_s1_n7.pdb
...qvol> The PDB B-factor field contains the equivalent spherical radii
...qvol> of the corresponding Voronoi cells (in Angstrom).
...qvol> Cluster analysis of the 8 independent runs:
...qvol> The PDB occupancy field in 4_s1_n7.pdb contains the rms variabilities of the vectors.
...qvol> Average rms fluctuation of the 7 codebook vectors: 0.986 Angstrom
...qvol> Radius of gyration of the 7 codebook vectors: 16.192 Angstrom
...qvol>
...qvol> Do you want to update or save the input connectivities?
...qvol> Choose one of the following options -
...qvol> 1: No. I'm done
...qvol> 2: Update and save to a PSF file
...qvol> 3: Update and save to a constraints file
...qvol> 4: Update and save to both PSF and constraints files
...qvol> 1
...qvol> Bye bye!

Docking with matchpoint

We now dock the high-resolution ncd structure into the corresponding low-resolution map with the matchpoint utility. Using the default assignment of options, there are 4 file name arguments to matchpoint at the shell prompt:

./matchpt 4_s1_n7.pdb 4_s1.situs 1_ncd_n7.pdb 0_ncd.pdb

The program returns 3 best fits, ranked by the codebook vector rms deviation, and also shows the standard correlation coefficient. The program also shows the permutation of the vectors that determine the superposition.
The three solutions and corresponding codebook vectors are written to disk. Here is the complete matchpoint session:

%./matchpt 4_s1_n7.pdb 4_s1.situs 1_ncd_n7.pdb 0_ncd.pdb
matchpt>
matchpt> Matchpoint
matchpt> ==========
matchpt>
matchpt> Loading qvol codebook vectors file: 4_s1_n7.pdb
matchpt> Loading Situs file: 4_s1.situs
matchpt>   Columns, rows, and sections: x=1-9, y=1-10, z=1-14
matchpt>   3D coordinates of first voxel (1,1,1): (396,252,132)
matchpt>   Voxel size in Angstrom: 6
matchpt> Loading qpdb codebook vectors file: 1_ncd_n7.pdb
matchpt> Loading high-resolution structure PDB file: 0_ncd.pdb
matchpt>
matchpt> Point-cloud matching...
matchpt>   Subcomponent has 7 points
matchpt>   Gets docked into a structure with 7 points
matchpt>
matchpt> Anchors: (0, 5, 4)
matchpt>   Number of potential anchor matches: 16
matchpt>   Searching
matchpt>   ...
matchpt> Anchors: (3, 2, 1)
matchpt>   Number of potential anchor matches: 124
matchpt>   Searching
matchpt>   ..............
matchpt> Anchors: (6, 2, 1)
matchpt>   Number of potential anchor matches: 114
matchpt>   Searching
matchpt>   .............
matchpt>
matchpt> Found 3 solutions.
matchpt>
matchpt>   Solution filename, codebook vector RMSD, cross-correlation coefficient
matchpt>   and permutation are printed out.
matchpt>   The permutations indicate the order of low res. vectors fitted to high
matchpt>   res. vectors, which is the opposite of how they are shown in qdock
matchpt>   and qrange.
matchpt>
matchpt>   [01] Solution_01.pdb - RMSD:  2.668 CC:  0.905 - ( 4, 2, 6, 7, 1, 5, 3)
matchpt>   [02] Solution_02.pdb - RMSD:  4.364 CC:  0.876 - ( 4, 3, 2, 1, 5, 7, 6)
matchpt>   [03] Solution_03.pdb - RMSD:  4.836 CC:  0.861 - ( 4, 6, 3, 5, 7, 1, 2)

The best solution has a relatively small vector rms deviation (~2.7 Å) and visual inspection reveals that it is very similar (2 Å rmsd) to the orientation found earlier (file 5_s1_1.dock.pdb). To learn more about this program and learn about its numerous options, you can execute it without arguments in the command shell or read the relevant user guide.

Note: As described in (Birmanns & Wriggers, 2007), the fast point cloud matching can be improved by a post-processing with the fast correlation-based tool colacor. We have explained this workflow in the separate manual docking tutorial.

Return to the front page .