Classic EM Tutorial, Part II
|
| The programs qpdb
and qvol are specialized tools that
perform
vector quantization to generate coarse-grained point cloud models of
the structural data. The matchpoint
tool (new in Situs 2.5) supports the
registration of point clouds even in cases where they are of unequal
size. The usage of these
routines is demonstrated in the following, using the examples of part
I. As a prerequisite at least the installation
step of part I must be followed. The rest of part I can be skipped
if the input files are copied from the
"solutions" directory.
More documentation is available in the user guide,
on the methodology page, and in the published
articles. |
Content:
|
| Vector
Quantization of High-Resolution Structures with qpdb
First, we perform the vector
quantization
of a high-resolution data with the qpdb
utility.
At the shell prompt, enter
| ./qpdb 0_ncd.pdb 1_ncd_n7.pdb |
and at the program prompt, select
to exclude the water molecules, perform mass-weighting, and select the
B-factor cutoff (atoms with
high B-factors will be ignored). Enter 300 to select all atoms. Next,
enter
the number of codebook vectors: 7. Watch the program compute a number
of
datasets (default: 8) for statistical averaging. The file 1_ncd_n7.pdb
now contains the seven codebook vectors, their rms variability, and the
effective radius of their Voronoi cells, in PDB
format. Also, the program returns the radius of gyration of the
vectors.
Finally, the user is asked whether
nearest-neighbor connectivities should be learnt, or whether the
Voronoi
cells should be saved. This functionality is important only for flexible
docking, so we enter no.
Here is the
output of the entire
qpdb calculation:
% ./qpdb
0_ncd.pdb 1_ncd_n7.pdb
lib_pio>
2632 atoms read.
...qpdb>
Found 2 hydrogens, 3 water atoms, 0 codebook vectors, 0 density atoms
...qpdb>
Hydrogens will be ignored.
...qpdb> 3
water atoms found in file 0_ncd.pdb
...qpdb>
Mass-weighting on.
...qpdb>
Do you want to exclude the water atoms?
...qpdb>
...qpdb> 1: No
...qpdb> 2: Yes
...qpdb> 2
...qpdb>
Water atoms will be ignored.
...qpdb>
Do you want to select atoms based on a B-factor threshold?
...qpdb>
...qpdb> 1: No
...qpdb> 2: Yes
...qpdb> 2
...qpdb>
Range of crystallographic B-factors: 2.00 - 100.00.
...qpdb>
Enter B-factor cutoff (only atoms below this value will be included):
300
...qpdb>
3307 equally weighted inputs out of originally 2632 atoms selected for
conversion.
...qpdb>
...qpdb>
Sphericity of the atomic structure: 0.68
...qpdb>
Enter desired number of codebook vectors for data quantization: (0 to
exit): 7
...qpdb>
Computing 8 datasets, 100000 iterations each...
...qpdb>
Now producing dataset 1
...qpdb>
Now producing dataset 2
...qpdb>
Now producing dataset 3
...qpdb>
Now producing dataset 4
...qpdb>
Now producing dataset 5
...qpdb>
Now producing dataset 6
...qpdb>
Now producing dataset 7
...qpdb>
Now producing dataset 8
...qpdb>
...qpdb>
Codebook vectors have been written to file 1_ncd_n7.pdb
...qpdb>
The PDB B-factor field contains the equivalent spherical radii
...qpdb>
of the corresponding Voronoi cells (in Angstrom).
...qpdb>
Cluster analysis of the 8 independent calculations:
...qpdb>
The PDB occupancy field in 1_ncd_n7.pdb contains the rms variabilities
of the vectors.
...qpdb>
Average rms fluctuation of the 7 codebook vectors: 1.022 Angstrom
...qpdb>
Radius of gyration of the 7 codebook vectors: 17.209 Angstrom
...qpdb>
...qpdb>
Do you want to learn nearest-neighbor connectivities?
...qpdb>
Choose one of the following options -
...qpdb> 1: No.
...qpdb> 2: Learn and save to a PSF file
...qpdb> 3: Learn and save to a constraints file
...qpdb> 4: Learn and save to both PSF and constraints files
...qpdb> 1
...qpdb>
...qpdb>
Do you want to save the Voronoi cells?
...qpdb>
Choose one of the following options -
...qpdb> 1: No. I'm done
...qpdb> 2: Yes. Save cells to a PDB file
...qpdb> 1
...qpdb>
Bye bye!
|
|
Vector
Quantization of Low-Resolution Maps with qvol
The vector quantization of a volumetric
datasets with the qvol utility involves
only a few more steps. After entering
| ./qvol 4_s1.situs 4_s1_n7.pdb |
the user has the choice of inspecting
the density distribution before entering a cutoff density value. Only
density
values that exceed the cutoff are considered in the vector
quantization.
Here, we enter the cutoff value 20 for file 4_s1.situs (see part I of this tutorial).
Subsequently, the user is prompted to enter the number of codebook
vectors. We choose again 7 since in this example both the PDB structure
and the target map are of equal size, but in principle, the matchpoint
program in the next step below is also able to handle unequal numbers
of codebook vectors.
Investigate now how the compactness
of the codebook vectors, defined by their radius of gyration, depends
on
the cutoff value chosen in the quantization. You should verify that
higher
cutoff densities yield more compact vectors. Note that cutoffs lower
than
20 do not make sense since 4_s1.situs was
created with floodfill using this value. The optimum cutoff for
qvol
is the one where the radius of gyration is equal to that computed with
qpdb for the vectors encoding the high-resolution structure.
Finally, the user is asked whether
nearest-neighbor connectivities should be learnt. This functionality is
only important for flexible docking,
so we enter no.
Here is the output
of a typical
qvol calculation:
% ./qvol
4_s1.situs
4_s1_n7.pdb
lib_vio>
File 4_s1.situs - Header information:
lib_vio>
Columns, rows, and sections: x=1-9, y=1-10, z=1-14
lib_vio>
3D coordinates of first voxel (1,1,1):
(396.000000,252.000000,132.000000)
lib_vio>
Voxel size in Angstrom: 6.000000
lib_vio>
Reading density data...
lib_vio>
Volumetric data read from file 4_s1.situs
...qvol>
Density values below a user-defined cutoff value will not be considered
...qvol>
Do you want to inspect the input density values before entering the
cutoff value?
...qvol>
Choose one of the following three options -
...qvol> 1: No
(continue)
...qvol> 2:
Show me the minimum and maximum density values only
...qvol> 3:
Show me the voxel histogram
...qvol> 1
...qvol>
Now enter the cutoff density value: 20
...qvol>
Cutting off density values < 20.000000, remaining occupied volume:
284 voxels (6.134400e+04 Angstrom^3)
...qvol>
Enter desired number of codebook vectors: 7
...qvol>
...qvol>
Using random start vectors.
...qvol>
Computing 8 datasets, 100000 iterations each...
...qvol>
Now producing dataset 1
...qvol>
Now producing dataset 2
...qvol>
Now producing dataset 3
...qvol>
Now producing dataset 4
...qvol>
Now producing dataset 5
...qvol>
Now producing dataset 6
...qvol>
Now producing dataset 7
...qvol>
Now producing dataset 8
...qvol>
Final clustering -- Average vector update: 6.754332e-01 Angstrom
...qvol>
Final clustering -- Average vector update: 0.000000e+00 Angstrom
...qvol>
...qvol>
Codebook vectors have been written to file 4_s1_n7.pdb
...qvol>
The PDB B-factor field contains the equivalent spherical radii
...qvol>
of the corresponding Voronoi cells (in Angstrom).
...qvol>
Cluster analysis of the 8 independent runs:
...qvol>
The PDB occupancy field in 4_s1_n7.pdb contains the rms variabilities
of the vectors.
...qvol>
Average rms fluctuation of the 7 codebook vectors: 0.986 Angstrom
...qvol>
Radius of gyration of the 7 codebook vectors: 16.192 Angstrom
...qvol>
...qvol>
Do you want to update or save the input connectivities?
...qvol>
Choose one of the following options -
...qvol> 1: No.
I'm done
...qvol> 2:
Update and save to a PSF file
...qvol> 3:
Update and save to a constraints file
...qvol> 4:
Update and save to both PSF and constraints files
...qvol> 1
...qvol>
Bye bye!
|
|
| Docking
with matchpoint
We now dock the
high-resolution ncd
structure into the corresponding low-resolution map with the matchpoint
utility. Using the default assignment of options, there are 4 file name
arguments to matchpoint at the shell prompt:
./matchpt 4_s1_n7.pdb
4_s1.situs
1_ncd_n7.pdb
0_ncd.pdb
|
The program returns 3 best fits,
ranked
by the codebook vector rms deviation, and also shows the standard
correlation coefficient. The program
also shows the permutation of the vectors that determine the
superposition. The three solutions and corresponding codebook
vectors are written to disk. Here is the complete matchpoint session:
%./matchpt
4_s1_n7.pdb 4_s1.situs 1_ncd_n7.pdb 0_ncd.pdb
matchpt>
matchpt> Matchpoint
matchpt> ==========
matchpt>
matchpt> Loading qvol codebook vectors file: 4_s1_n7.pdb
matchpt> Loading Situs file: 4_s1.situs
matchpt> Columns, rows, and sections: x=1-9, y=1-10,
z=1-14
matchpt> 3D coordinates of first voxel (1,1,1):
(396,252,132)
matchpt> Voxel size in Angstrom: 6
matchpt> Loading qpdb codebook vectors file: 1_ncd_n7.pdb
matchpt> Loading high-resolution structure PDB file: 0_ncd.pdb
matchpt>
matchpt> Point-cloud matching...
matchpt> Subcomponent has 7 points
matchpt> Gets docked into a structure with 7 points
matchpt>
matchpt> Anchors: (0, 5, 4)
matchpt> Number of potential anchor matches: 16
matchpt> Searching
matchpt> ...
matchpt> Anchors: (3, 2, 1)
matchpt> Number of potential anchor matches: 124
matchpt> Searching
matchpt> ..............
matchpt> Anchors: (6, 2, 1)
matchpt> Number of potential anchor matches: 114
matchpt> Searching
matchpt> .............
matchpt>
matchpt> Found 3 solutions.
matchpt>
matchpt> Solution filename, codebook vector RMSD,
cross-correlation coefficient
matchpt> and permutation are printed out.
matchpt> The permutations indicate the order of low res.
vectors fitted to high
matchpt> res. vectors, which is the opposite of how they
are shown in qdock
matchpt> and qrange.
matchpt>
matchpt> [01] Solution_01.pdb - RMSD: 2.668
CC: 0.905 - ( 4, 2, 6, 7, 1, 5, 3)
matchpt> [02] Solution_02.pdb - RMSD: 4.364
CC: 0.876 - ( 4, 3, 2, 1, 5, 7, 6)
matchpt> [03] Solution_03.pdb - RMSD: 4.836
CC: 0.861 - ( 4, 6, 3, 5, 7, 1, 2)
|
The best solution has a relatively
small vector rms deviation (~2.7
Å)
and visual inspection reveals that it is very similar (2 Å rmsd)
to the orientation found
earlier
(file 5_s1_1.dock.pdb). To learn more about this program and
learn about its numerous options, you can execute it without arguments
in the command shell or read the relevant user
guide.
Note: As described in (Birmanns & Wriggers, 2007), the fast point
cloud matching can be improved by a post-processing with the fast
correlation-based tool colacor. We
have explained this workflow in the separate manual docking tutorial.
|
| Return
to the front page . |
|