Home

DOT 2.0 User Guide - San Diego Supercomputer Center

image

Contents

1. In your project directory the file dotruns is created which is a cumulative log of all DOT runs started in projdir This file logs the directory names and your comments at the beginning and end of each run Very handy C 4 DOT output and evaluation The basic output of DOT put into the runs DATE TIME prefix_from_parm_file output directory is an e6d file For our example starting with stat pdb and mov pdb statmov top2000 e6d This file contains the information for generating the 2000 default top ranked placements of the moving molecule These placements are relative to the centered coordinates of the stationary molecule in our case stat cen noh pdb which is copied into the output directory from projdir coords stat Rundot also performs a preliminary evaluation of the DOT run by running the script evaluate_dot_run which processes the information in the e6d file The evaluation includes creating PDB files of the 30 top ranked placements of the moving molecule in the subdirectory top30pdb See Chapter 6 for a detailed description of the evaluation performed by evaluate_dot_run and further evaluations that you can do C 5 Testing DOT on multiple computers processors If this is the first time you have run DOT on multiple machines we suggest that you first test parallel processing on a small DOT run Prepscript makes a parameter file for doing just 60 rotations of the moving molecule For your test run do r
2. Gnu AWK gawk can be used A few of DOT s newest analysis tools use the Ruby programming language If you need to run dot_pdb_e6d_eval_rmsd or pdb_rmsd_matrix and do not have Ruby installed you may download it for free from http ruby lang org en downloads any version 1 8 2 or later is fine Creating the required input for DOT needs 3 specialized external programs MSMS Reduce and APBS MSMS can be downloaded as described below The DOT distribution includes pre compiled binaries for Reduce and APBS The DOT utilities invoke the programs by the names 52 October 1 2008 Need help Email dot help sdsc edu msms reduce apbs These programs are needed only on the platform on which your users will be preparing input for DOT If your users will be preparing input only on a particular platform such as Mac OS X on Intel you need install them for that platform only Detailed explanation of each program and library follows D 2 MSMS The MSMS program does molecular surface calculations needed by prepscript MSMS is from the laboratory of Michel Sanner Binary executables of MSMS for all of the platforms supported in the current DOT distribution are available from http mgltools scripps edu downloads Be aware that you will need to scroll down this page to find the MSMS package Download the appropriate tar gz file run gunzip tar gz then make a new directory say i86Darwin8 if that is your platform cd into that
3. on a host that supports that platform eg Intel Macintosh for i86Darwin8 source DOT_ROOT bin share dot2 setup bash or csh cd DOT_ROOT mpich ARCHOSV note we found that on Mac OS X Darwin8 we had to set the environment variable RSHCOMMAND at this point On csh type setenv RSHCOMMAND usr bin ssh On bash type export RSHCOMMAND usr bin ssh configure for_dot2 This runs mpich 1 2 7p1 configure prefix DOT_ROOT mpich 1 2 7p1 ARCHOSV with device name ch_p4 disable f77 disable cxx disable f90modules disable short longs make Note a make install is not necessary if you use the prefix above See http www unix mcs anl gov mpi mpich1 docs mpichman chp4 node44 htm Node44 for more configuration advice We found we can ignore messages like cd amp amp bin sh usr local dot2 0 src config aux_dir missing run autoconf autom4te cannot lock autom4te cache requests with mode 2 perhaps you are running make j on a lame NFS client Operation not supported make configure Error 1 These seem to be a clock skew problem between the different hosts but we re not sure If you get these it should be OK to use 54 October 1 2008 Need help Email dot help sdsc edu make i make i install so make will ignore errors F What else you need to be able to recompile DOT programs F 1 GNU Autoconf Automake We strongly recommend you download the most current versions of e Th
4. HIP Protonated both on ND1 and NE2 For Cys variants are recognized by the protonation of SG CYS Free cysteine protonated on SG CYX Cysteine that is part of a disulfide no hydrogen atom added to SG 27 October 1 2008 Need help Email dot help sdsc edu Charged N and C termini of peptide chains are identified by N terminus example ALAN Presence of atoms 1H 2H and 3H C terminus example ALAC Presence of atom OXT For example if an amino acid residue say Ala includes the atom types 1H 2H and 3H prepscript will assign it as being a positively charged N terminal residue with the residue name ALAN 5b Charges should add up to correct formal charge DOT 2 User Manual CVS SId du prep xyzq tex v 1 4 2008 04 02 03 52 04 vickie Exp 1 6 Moving molecule shape xyz The script dot2 prep mol common called by prepscript creates the xyz file which contains the centered coordinates of the heavy atoms of the molecule This DOT input file defines the molecular shape of the moving molecule Each line of the xyz file has fields X Y Z in Angstroms Example xyz file fragment 24 004 10 540 13 354 14 847 1 450 26 429 23 264 49 230 39 562 For example for our starting moving molecule coordinates mov pdb the centered xyz file is mov cen noh xyz which has the same number of atoms as mov cen noh pdb This file is created in dot2 prep mol common pdb_to_xyz mov cen noh pdb gt mov cen noh xyz DOT wi
5. To check that DOT can now be located type rundot help You should see a single line giving a Usage message If not check that your PATH matches the location where the DOT distribution is installed by typing ls DOT_ROOT which gives a listing that should include the directories bin data src and test among others B Copy the tutorial files to your directory Make a new directory for example testdot and copy the DOT tutorial files to it Need help Email dot help sdsc edu mkdir testdot cd testdot dot_tutorial This puts the tutorial files in two directories named test rundot and test prepscript C Run DOT with prepared input files You can now do a short DOT run to check that all is well Type the two lines cd test rundot rundot udgugi deg72 nb0 parm This will start a DOT run on your computer docking udg uracil DNA glycosylase with ugi an inhibitor protein using a coarse thus fast 72 degree rotation increment You will see a page or two of log file output followed by an opportunity for you to record an entry in a lab notebook file named dotruns Result for dotruns log file Type control D to end gt You can then type remarks such as first example run took 2 minutes and then type ENTER or RETURN then control D on a line by itself to complete your entry The DOT output for all runs started in a given directory goes into a subdirectory named runs Rundot crea
6. of one or both termini that are not in the PDB coordinates These missing residues should be indicated by a REMARK 23 October 1 2008 Need help Email dot help sdsc edu 465 MISSING RESIDUES in the PDB file If the termini are not the true termini they should be uncharged The easiest way to do this is to leave the termini as standard amino acids Creating a neutral N terminus requires manipulation of the PDB files created by Reduce see Section 4a In some protein coordinates the last residue of a peptide chain includes an OXT atom even though it is not the true terminus If you know the C terminus should be uncharged remove the OXT atom Reduce also checks for the proper orientation of Gln and Asn side chains determines the protonation state of His residues from the local environment and looks for and corrects steric clashes within the macromolecule 4a Protonation of N terminal amino acids in proteins In proteins we call N terminal residues all those that do not have a covalent bond between the main chain N atom and a preceding C O group defined in Reduce by a C N distance between and A N terminal residues include 1 the true N terminus of a peptide chain 2 the starting residue of a peptide chain where preceding residues are missing 3 a residue in the middle of a peptide chain when coordinates for the preceding covalently bonded residue are not in the PDB file as in a disordered loop For true N termini we want a
7. pdb_to_xyz gt Smovxyz run filter input from statmov top2000 e6d output to statmov filter e6d Filter will pass any e6d placements that have one or more of the specified moving molecule atoms within 6 Angstroms of any of the specified stationary molecule atoms count 1 distmax 6 dotxyzfilter Sdistmax count Sstatxyz Smovxyz lt statmov top2000 e6d gt statmov filter e generat ebd_firsi e PDB files of top 30 that pass filter 30 statmov filter e6d pdbgen movmol filter Here is a fancier example script that finds any placements where 5 or more C alpha atoms in stationary residues 25 to 33 are within 6 Angstroms of any atom in moving residues 228 240 or 293 Without the r flag it would report the reverse any placement where 5 or more atoms in moving residues 228 etc were within 6 Angstroms of any of the C alpha atoms in stationary residues 25 to 33 bin bash statmol stat cen noh pdb movmol mov cen noh pdb create xyz files statxyz statmol filter xyz working name pdb_resrange 25 33 Sstatmol pdb_ca pdb_to_xyz gt statxyz movxyz movmol filter xyz working name pdb_resnum 228 240 293 Smovmol pdb_to_xyz gt Smovxyz run filter input from mov top2000 e6d output to mov filter e6d 44 October 1 2008 Need help Email dot help sdsc edu filter will pass any e6d placements with any five or more stationary atoms within 6 Angstoms of any moving atom c
8. In DOT the shape is represented by the atomic coordinates of all non hydrogen atoms and the charge distribution by point charges at the atomic coordinates including polar hydrogen atoms Both are rapidly mapped onto the grid for each orientation If the user wants one molecule be defined in more detail than the other that molecule should be assigned as S We have found the reverse case when a fragment of double stranded DNA is used to represent part of a long DNA strand The DNA fragment works best as M where its charge distribution is represented by atomic point charges When the electrostatic potential for a DNA fragment is calculated by Poisson Boltzmann methods the greater solvent accessibility of the ends creates a nonuniform electrostatic potential along the DNA strand Near the ends of the fragment the electrostatic potential is neutral only in the center of the fragment is the potential due do the phosphate atoms constant Thus the electrostatic potential of the DNA fragment is highly dependent on its length When the DNA fragment is represented as point charges the charge distribution is consistent over the full fragment D 3 Other factors One protein environment may be more suitable represented as either S or M For example if one molecule is embedded in a membrane environment this could be included as part of the description of the stationary molecule DOT 2 User Manual CVS Id du prep directory tex v 1 4 2007 10 30 06 06 19 vic
9. NH3 group positively charged for other N termini we want a NH group neutral forming a complete residue but with an incomplete covalent shell for the N atom Case 1 Residue number 1 is the true N terminus of the peptide chain This case requires no user intervention REDUCE will by default add 3 hydrogen atoms to residue 1 of a peptide chain creating a positively charged NH3 group Three hydrogen atoms will ONLY be added if REDUCE determines that the N atom is not attached to a preceding C O group distance criterion For example if the protein chain is numbered 1A 1B 1 2 REDUCE puts the NH3 group on the first residue 1A In the next step of dot2 prep mol common the presence of the N terminal hydrogen atoms will be used to determine that this is a positively charged N terminal residue and the corresponding partial atomic charges will be assigned Case 2 The peptide chain begins with a residue having a residue number other than 1 and the user wants this residue to have a positively charged NH3 group at the N terminus This case requires editting the user s customized prepscript so that Reduce will add hydrogen atoms as desired Edit the call to dot2 prep mol common for the appropriate molecule by adding the Nterm flag as indicated in the comment line above the call The Nterm flag is feed into the Reduce command line Example My stationary molecule stat pdb begins with residue number 82 and this residue should ha
10. October 1 2008 Need help Email dot help sdsc edu vol_to_xyzcrv script xyzq_check_ok script verifies that total charge is integral and within reasonable limits used by prepscript xyzq_xyzr_to_xyzqrxml script merges separately computed charge and radius files into input suitable for apbs program Not currently used by DOT xyzqr_to_xyzqrxml script converts x y z charge radius file into input suitable for apbs program 50 October 1 2008 Chapter 8 DOT Installation DOT 2 User Manual CVS Id du installdot tex v 1 23 2008 05 09 05 27 44 mp Exp A Introduction Welcome to DOT The following instructions provide a basic overview of a simple DOT installation on your system The DOT distribution consists of a single tarred and zipped file containing pre compiled binaries and two external libraries for your particular platform as well as shell scripts two of three needed external programs user documentation and the DOT2 license We currently offer binaries for the Red Hat Intel Linux platform as well as command line binaries for Mac OS X both Intel and PowerPC processors and for SPARC Solaris 8 Sun OS 5 For all other systems you will need to compile and install from source Please feel free to contact the Dot help line dot help sdsc edu regarding any issues or requests B DOT Installation Quick Start Guide In the instructions that follow we are assuming you will be using the bash shell You may instal
11. RNA with 5 termini beginning with either the sugar or including a preceding phosphate group and 3 termini GTP ATP and heme Fe II that are consistent with the polar atom atomic charges of AMBER Currently only a single library file can be used so these libraries need to be appended to the standard protein library For example for a DNA protein system in our directory projdir I make a subdirectory data copy the standard protein library and the DNA library into it and append the DNA library to make a new library file uhbd amber84 prot dna rlb cd projdir mkdir data cd data cp DOT_ROOT data uhbd amber84 prot rlb cp DOT_ROOT data cofactors dna amber rlb cat uhbd amber84 prot rlb dna amber rlb gt uhbd amber84 prot dna rlb To use your new library in prepscript you must add the 1 flag and the FULL pathname of your new library in your gen_dot_prepscript command Section G DOT 2 User Manual CVS SId du prep gen tex v 1 7 2008 04 02 03 52 04 vickie Exp G Create Prepscript Prepscript is an executable script that produces all the needed DOT input files including molecular descriptions for the stationary and moving molecules and parameter files for running DOT see Section for a list of files You will use DOT_ROOT bin gen_dot_prepscript to generate prepscript customized for your system For gen_dot_prepscript and prepscript to work you need to e Have your path set up to access DOT scripts an
12. The command to run DOT is rundot parameterfile h hostfile optional comments The parameter file is required If no hostfile is specified DOT will run on one processor on the computer that you are logged into If your optional comments contain special characters you should surround them with single quotes Examples are given below For each new molecular system we STRONGLY suggest first running a test case that uses one processor and does only a single rotation for the moving molecule This will check that DOT found the input files that DOT_ROOT is 10 October 1 2008 Need help Email dot help sdsc edu set properly and that DOT is running okay For our project starting with stat pdb and mov pdb the command to run a single rotation of the moving molecule is rundot statmov zerorot nb0 parm For your system the parm files will have your molecule names as made by prepscript This runs the DOT on one processor the one you are logged into At the end of the run you are prompted enter any additional comments ending with control D C 2 Running DOT on multiple computer processors For your project you will want to do a full rotational search You could do this on a single processor but the run can be done much quicker on multiple computers or processors DOT is designed to run very efficiently on multiple computers in parallel using MPI Message Passing Interface All of the computers that you use must have access to the approp
13. coords subdirectory of your project directory You must decide which molecule will be stationary and which molecule will be moved about the stationary molecule Generally the molecule with the largest dimension should be the stationary molecule and the smaller molecule should be the moving molecule This is computationally most efficient see Section K 1 resulting in both a smaller grid size and fewer orientations of the moving molecule to get reasonable rotational sampling Throughout this manual stat pdb will be the stationary molecule stat pdb and mov pdb will be the moving molecule Typically we use names from the PDB like 1cco pdb If you are only using one molecule from a PDB file that contains multiple copies of the molecule edit the file in coords so that it contains only the single molecule So now we have projdir coords stat pdb projdir coords mov pdb B 3 Create prepscript for your molecular system In the coords subdirectory create a copy of prepscript customized for your molecular system by gen_dot_prepscript m movingpdb s stationarypdb d griddim 1 residue library o prepscript where movingpdb and stationarypdb are the PDB files for your two molecules required in our example mov pdb and stat pdb By default gen_dot_prepscript will generate a prepscript that figures out a reasonable size for the grid uses the default residue library and is named prepscript The optional flags can be used
14. degrees is sufficient copy statmov deg06 nb0 parm to statmov deg10 nb0 parm and change deg06 to deg10 in the line rot_file DOT_ROOT data deg06 eul To allow ten bumps change the 0 to 10 in the line mov_atoms_in_stat_interior_limit 0 Section gives details on the many other parameters for DOT but these three are by far the ones most commonly changed C Molecular description files needed by DOT The parameter file also names the four necessary input files for DOT mov cen polh xyzq electrostatic charges of atoms of moving molecule stat cen polh apbsgrd electrostatic potential of stationary molecule mov cen noh xyz van der Waals spheres of moving molecule stat cen nolh xyzcrv shape potential of stationary molecule D Running DOT D 1 Single processor mode You need only a DOT parameter file and the four input files discussed above rundot parmfile optional comments for log These optional comments will be appended to projdir dotruns txt D 2 Multiple processors such as a workstation farm You need a DOT parameter file the four input files discussed above and a hosts file that lists the names of the computers to run DOT on one per line rundot parmfile hosts dot hosts optional comments for log If you have a dual processor or quad or more computer like a recent Macintosh Intel Core Duo or a new Sun you can use all the processing cores by making a hosts file that has your computer name type ho
15. dot2 prep mol common The script dot2 prep mol common is run in prepscript for both the moving and the stationary molecules The script dot2 prep mol common does the following steps e removes hydrogen atoms and water molecules from the starting PDB files e centers the molecules e calls the program Reduce to build hydrogen atoms 21 October 1 2008 Need help Email dot help sdsc edu e creates the centered PDB files with heavy atoms and polar hydrogen atoms with each protonation charge state having a unique residue name e creates the moving molecule shape file xyz e creates the atomic charge file xyzq and checks total molecular charge These steps create all the required DOT input files for the moving molecule and create the files needed to calculate the shape and electrostatic potentials of the stationary molecule I 1 Check on range for total molecular charge Before running dot2 prep mol common prepscript sets the parameters qtotmin and qtotmax to define an acceptable range for the total charge on each molecule The default in prepscript is qtotmin 20 qtotmax 20 If the total charge of either molecule is not in this range Prepscript will quit with an error message If you know the total charge of your molecule is an integer outside this range adjust the limits accordingly For example a fragment of double stranded DNA with 12 base pairs has a total molecular charge of 22 so changing qtot
16. for new functional groups 6 Create prepscript gen_dot_prepscript 7 prepscript general features a tools that prepscript uses b Assigns systematic naming to needed files cen centered noh no hydrogen atoms polh with polar hydrogen atoms allh with all hydrogen atoms c Assigns file suffixes to indicate file type d Pause command for user intervention 8 Prepscript steps common to both molecules dot2 prep mol common a Define an acceptable range for the total charge on each molecule b Remove waters H atoms alternate locations from PDB files c Center both molecules with no H atoms cen noh pdb_make_centered d Add hydrogen atoms to PDB coords using Reduce cen allh 13 Need help Email dot help sdsc edu e Create files with heavy atoms and polar H atoms with RES names to match library cen polh pdb_rename_res_by_hydrogens striph f Moving molecule Create shape file xyz based on heavy atoms g Moving molecule Create partial atomic charge file xyzq 9 Prepscript Calculate diameters of molecules to determine grid size 10 Prepscript Stationary molecule a Stationary molecule Create electrostatic potential dx b Create APBS command and parameter files c Run APBS d Stationary molecule Create shape potential xyzcrv e Run MSMS heavy atoms only to define excluded and favorable volumes f Determine electrostatic clamping values 11 Prepscript Create parameter fil
17. input files takes as input two starting PDB files each with a single copy of a macromolecule If multiple copies of the macromolecule are present in the asymmetric unit of a crystallographic file the user must decide which set of coordinates is to be used for docking and create an input PDB file that contains just that molecule The molecule can include multiple chains Prepscript will remove water molecules hydrogen atoms and multiple positions of a residue in the PDB files unless the user explicitly does not want this to happen and edits prepscript appropriately Each residue in the PDB file must include all of the heavy nonhydrogen atoms that correspond to that residue in the residue library It is okay if the macromolecule is missing an entire residue or a region consisting of many residues such as missing residues at the N or C termini or disordered loop regions that do not appear in the PDB file C 1 Missing atoms in side chains If a PDB file lacks the full set of heavy atoms that define a residue the PDB curators usually indicate this in REMARK 470 MISSING ATOMS The heavy atoms must match those associated with the residue name hence the user has two choices 1 build the full side chain or 2 rename the residue to match the existing atoms For example given a lysine residue with atoms only to Cy the user could either build the full Lys side chain by adding atoms C Ce and NC or could remove the Cy atom leaving only the Cf atom o
18. single supercomputer The procedure varies among the supercomputer centers CCMS has experience with BlueGene and we are eager to help you but cannot offer written instructions yet E DOT actions on input files E 1 Stationary molecule shape potential xyzcrv 1 Map onto grid 2 Forbidden default value 1000 3 Why another Forbidden value might be needed 4 Forbidden needs to be larger than max number of M atoms in favorable layer 5 Attractive default value 1 38 October 1 2008 E 2 Need help Email dot help sdsc edu Moving molecule shape xyz Coordinates of moving mol heavy atoms are mapped onto nearest grid point by DOT Why not interpolated Problems with excluded stat mol volume Stationary molecule electrostatic potential apbsgrd Electrostatic clamping what and why Interior zeroing what and why Moving molecule partial atomic charges xyzq Atomic charges interpolated onto nearest 8 grid points by DOT Distributes M atomic charge better over grid slightly better answers F What DOT computes F 1 F 2 van der Waals energy Count of number of M heavy atoms in favorable layer surrounding S Count times 0 1 kcal mol to give interaction energy S favorable layer 3 A thick Electrostatic energy Stationary molecule elec pot grid made using Poisson Boltzmann methods Moving molecule partial atomic charges Intermolecular elec energy calculated as mov mol partial charges i
19. to override these defaults Example 1 I want to make prepscript for molecules stat pdb and mov pdb let prepscript determine the grid size and use the default residue library 8 October 1 2008 Need help Email dot help sdsc edu gen_dot_prepscript m mov pdb s stat pdb Example 2 I want to make prepscript for molecules stat pdb and mov pdb use a 128 grid with 1 A spacing and use my library home me misc myreslib rlb gen_dot_prepscript m mov pdb s stat pdb d 128 1 home me misc myreslib rlb Both examples create a new file prepscript in the coords directory in our case projdir coords prepscript that is customized for the input PDB files mov pdb and stat pdb B 4 Run prepscript To run prepscript keeping a log file do For bash prepscript 2 gt amp 1 tee prepscript log For csh prepscript amp tee prepscript log For an outline of the steps performed by prepscript and a detailed description of each step see Chapter 4 page 13 B 5 Files created by prepscript Prepscript creates two subdirectories in coords with names based on your molecules Each directory contains the needed DOT input files for each molecule In addition parameter files for running DOT will be put in the main project directory In our example with stat pdb and mov pdb where prepscript determined we should use a grid 128 on each side 1 A grid spacing and the ionic strength for the electrostatic potential calculation is 150m
20. udgugi deg06 nb10 parm To remove all files normally created by prepscript type rm parm cen center minmax log com DOT input files for the stationary molecule will be in the directory coords udg files for the moving molecule will be in coords ugi DOT parameter files parm will be created in the parent test prepscript directory If you like you can 5 October 1 2008 Need help Email dot help sdsc edu go up to that directory now and run DOT using the DOT input files you just prepared cd rundot udgugi deg72 nb0 parm 6 October 1 2008 Chapter 3 DOT Quick Start Guide DOT 2 User Manual CVS Id du qs tex v 1 14 2008 05 09 04 57 58 mp Exp DOT its utilities and the programs MSMS APBS and Reduce must already be installed see Chapter 8 for instructions A Set up user environment Goal The goal of this section is to make the DOT program DOT utilities and the auxilary programs MSMS APBS and Reduce available to the user A 1 DOT_ROOT To find DOT and its utilities the DOT_ROOT environment variable must be set to where DOT is installed For example if DOT is installed in usr local dot2 type For csh tcsh setenv DOT_ROOT usr local dot2 For sh bash export DOT_ROOT usr local dot2 Your path must be set to access DOT utilities scripts and data For csh tcsh source DOT_ROOT bin share dot2 setup csh For sh bash source DOT_ROOT bin share dot2 setup sh A 2 MSMS APBS Red
21. 8 mp Exp L DOT Parameter File parm Once all the molecule input files have been generated successfully as indicated by the presence of the files above and any messages displayed prepscript then creates sample DOT parm files that specify the DOT calculation to be performed The DOT distribution includes a template for the DOT 2 0 parameter file in DOT_ROOT data dot_parm_template The default values are used unless they are set explicitly in prepscript The grid dimensions grid spacings the location of the four structure files and clamping range are set by prepscript When proposed docked structures penetrate this surface the molecules may be thought to bump into each other In fact to allow for flexibility not captured by our rigid model we may allow a specified number of bumps in our results This number will be specified in the dot parm file DOT 2 User Manual CVS Id du prep files tex v 1 7 2007 11 21 22 19 41 mp Exp empty du prep files tex file DOT 2 User Manual CVS Id du prep checks tex v 1 3 2007 08 16 06 30 11 mp Exp M Prepscript checks throughout processing Although at this point the user has most assuredly provided appropriate input files a little error checking couldn t hurt Prepscript will check for the existence of wate and a particular but common non polar hydrogen Also since the structure should contain the polar hydrogens we also check that the file contains at least some hydrogens Thes
22. DOT 2 0 User Guide Victoria A Roberts Michael E Pique Susan Lindsey Martin S Perez Elaine E Thompson Lynn F Ten Eyck San Diego Supercomputer Center University of California San Diego 9500 Gilman Drive La Jolla CA 92093 http www sdsc edu CCMS Email dot help sdsc edu October 1 2008 Contents 1 Introduction 1 A What DOT does cos osa a a ee BS 1 B Key DOT Chapters 7 nt a A Se oe A a he A 1 C Acknowledgements ce e e mo eaa a ee a a ee ee 2 2 Try DOT a Tutorial Example 3 A Setup user environment 2 2 0 ee 3 B Copy the tutorial files to your directory 2 ee 3 C Run DOT with prepared input files 2 2 2 0 00 0000200000200 00000 4 D Prepare DOT input files using supplied PDB files o e 4 D 1 Make sure necessary programs are installed o o e e 4 D2 R n prepscript dit ll A a BP 5 3 DOT Quick Start Guide 7 Ay Setup User ENVIO MENA La A A St haw eae ak ae ana 7 A l SDOL ROOT 5 dd Ord He Oe Ry aan ARES 7 A 2 MSMSyAPBS Reduce lic cede e Geo ul oo ge a Gee lal eG 7 B Setup your molecular system oaoa ee 8 B 1 Create Working Directories s isins acts abe a A Ge A ar acd ks are al A aa 8 B 2 Select the stationary and moving molecules 0 0200000000004 8 B 3 Create prepscript for your molecular system o e 8 B4 Runprepscript ce aeea ee 9 B 5 Piles created by prepseript n c i s
23. For our starting stationary molecule stat pdb the coordinates used for the electrostatic calculation are the centered coordinates with polar hydrogen atoms stat cen polh pdb For a grid 128 A on a side 1 A spacing with the calculation done at 150 mM ionic strength the APBS input files created by dot2 prep potgrid apbs are the parameter file stat cen 128 150m apbs in and the atomic position charge radius file stat cen polh xyzqr Output files are the electrostatic grid stat cen 128 150m dx and the log file stat cen 128 150m apbs log Prepscript checks that the total charge calculated by APBS is an integer by looking at the APBS log file If the total charge in the APBS log file is not an integer prepscript will stop Even if the total charge is an integer itis a good idea to check that the charge is correct look for Net charge in the APBS log file This charge should be the same as prepscript calculated in the last line of the xyzq file Note As a run alone program APBS calculates the electrostatic potential for any set of coordinates taking X Y Z charge and radius as input APBS does no internal checking of whether residues are complete The wise user will check the expected charge on a molecule given the number of charged side chains Lys 31 October 1 2008 Need help Email dot help sdsc edu Arg Asp Glu the charge state of the termini the protonation state of His as determined by Reduce and the charge due
24. M default the following DOT input files are created 1 Stationary molecule files in projdir coords stat a stat 128 150m dx the electrostatic potential of the stationary molecule where the general name of the file is molname grid_dim ionic_strength dx b stat cen noh xyzcrv the shape potential of the stationary molecule based on heavy atoms only c stat cen noh pdb the PDB file of the stationary molecule heavy atoms only for evaluation 2 Moving molecule files in projdir coords mov a mov cen polh xyzq partial atomic charges for the moving molecule including polar hydrogen atoms b mov cen noh xyz shape of moving molecule represented by the atomic centers of heavy atoms only c mov cen noh pdb the PDB file of the moving molecule heavy atoms only for evaluation 3 Parameter files in projdir a statmov zero rotation nb0 parm Single orientation for mov pdb no penetrations of mov into stat allowed b statmov deg72 nb0 parm 72 orientational search 60 orientations for mov pdb no penetrations of mov into stat allowed c statmov deg06 nb0 parm 6 orientational search 54 000 for mov pdb no penetrations of mov into stat allowed d statmov deg06 nb10 parm 6 orientational search 54 000 for mov pdb up to 10 penetrations of mov into stat allowed These are the files that DOT uses Additional files are made in the molecule subdirectories generated as prepsc
25. PBS Adaptive Poisson Boltzmann Solver performs electrostatic potential calculations needed by prepscript APBS is from the laboratory of Nathan Baker baker biochem wustl edu of the Department of Biochemistry and Molecular Biophysics Center for Computational Biology Washington University in St Louis APBS is free open source software available from http sourceforge net projects apbs After you have downloaded the executable binary for your platforms of interest you may install apbs anywhere you like as long as it is in your path when you run prepscript We found that the Mac OS X Darwin8 version a universal binary for both PPC and 186 Macintoshes needs administrator privileges to install this appears to be a mistake in the installation program The DOT project is currently using APBS version 1 0 0 21 April 2008 53 October 1 2008 Need help Email dot help sdsc edu E External libraries needed to run DOT The DOT distribution includes two external libraries FFTW and MPICH MPICH is needed for all platforms on which your users will be running DOT in parallel and you will need to configure and make it for them E 1 MPICH The MPICH software allows DOT calculations to be done in parallel on a local area network of computers MPICH is from the Argonne National Laboratory of the United States Department of Energy and is free open source software You only need MPICH if you wish to run a DOT job on more than one computer In pri
26. T electrostatics then generate PDB files of the top 20 e6d_first 20000 statmov top20000 e6d ace coords stat stat cen pdb coords mov mov cen pdb e6d_sortby E_elec ACE6 e6d_first 20 pdbgen mov cen pdb mov 43 October 1 2008 Need help Email dot help sdsc edu L Distance filtering A distance filter is useful in applying known biochemical data such as that a specific residue or set of residues lie in the interface The basic tool is the DOT utility dotxyzfilter whose arguments are a distance a count a stationary xyz file a moving molecule xyz file and an input e6d file It passes through ie filters any e6d placements for which at least count stationary atoms are within the specified distance of one or more moving molecule atoms Here is an example script that finds any placements where moving residue 228 s CZ atom is within 6 Angstroms of stationary residue 96 s CD atom Note all these examples assume they are run in the directory that holds your results such as runs 20080704 statmov nb0 bin bash statmol stat cen noh pdb statres 96 statatom CD movmol mov cen noh pdb movres 228 movatom CZ create xyz files working file name eg stat 228 CZ xyz statxyz S statmol Sstatres statatom xyz pdb_resnum statres Sstatmol pdb_atomtype statatom pdb_to_xyz gt Sstatxyz movxyz movmol Smovres S movatom xyz working name pdb_resnum movres Smovmol pdb_atomtype Smovatom
27. _charge_file filename xyzq file mov_vdw_file filename xyz file default is mov_charge_file data mov _pdb_file filename pdb file passed to evaluate_dot_run rot_file filename Euler angle file out_base string Ce base name for all output files stat_pot_clamp_high float 6 Max electrostatic potential value used stat_pot_clamp_low float 6 Min electrostatic potential value used stat_pot_interior_scale float 0 Experts only interior electrostatic scaling stat_pot_interior_zero boolean true Synonym for stat_pot_interior_scale 0 0 stat_vdw_interior float 1000 Experts only sets forbidden value vdw_weight float 0 1 van Der Waals energy term weighting electrostatic_weight float 1 0 electrostatic energy term weighting mov_atoms_in_stat_interior_limit integer 0 how many bumps to allow do_partition_sum boolean false compute partition sum free energy partition_sum_temp float 300 0 Kelvin do_energy boolean false Retain grid of best energy per grid cell do_bgrids boolean false Automatically implies do_energy do_histograms boolean false output_log_detail integer 1 Values 4 to 8 are resonable output_how_many_best_values integer 200 how many globally best energies to retain output_how_many_per_gridcell_best_values integer 0 Also called saved_best_values output_how_many_partition_sum_best_values integer unlimited Effective only if do_partition_sum is true output_all_Ethreshold float 1000 Experts only report every e
28. am checks triangle list vertices for data debugging create triangles compiled program creates triangle list from MSMS output expand triangles compiled program enlarges triangles by normals 46 Need help Email dot help sdsc edu matrix_to_eul compiled program converts rotation matrices to DOT euler angles fill double hull compiled program fills region between polyhedra fill hull compiled program fill region within polyhedron used by prepscript print half edges compiled program 6dtoxfm compiled program converts xyz euler to 4x3 matrix e6d_closeness compiled program reports angle and translation between xyz euler values and specified target See also e6d_closest and e6d_select_by_dist_angle e6d_closest compiled program finds xyz euler values within specified tolerances of a specified target See also e6d_closeness and e6d_select_by_dist_angle e6dexpand compiled program fills cubical grid from e6d file for visualization orient_survey compiled program checks euler files for completeness and non redundancy ACE script oodot2 script runs ACE evaluation with specified options Analyze MSMS script script Expand tri script script Fill double hull script script runs fill double hull with specified options Fill hull script script runs fill hull with specified options MSMS exp script script acenames script converts PDB atom names to internal ACE codes apbsgrd_lookup_xyz s
29. and description of the residue names see DOT_ROOT data uhbd amber84 prot README The library includes polar hydrogen atoms but not nonpolar hydrogen atoms Partial atomic charges match those from AMBER for amino acid residues with polar hydrogen atoms only Weiner et al 1984 J Amer Chem Soc 106 765 784 The library format is that used by the UHBD University of Houston Brownian Dynamics program Gilson et al 1993 J Phys Chem 97 3591 The library has two sections The first EQUIVALENCE allows the user to equivalence an atom name in their PDB files to the atom name used in the library For example the main chain amide H used to sometimes be called HN so for ALA the line in the EQUIVALENCE section is ALA HN ALA H 17 October 1 2008 Need help Email dot help sdsc edu which maps HN in ALA in the input PDB file to H in ALA in the residue library These lines are less needed now with the remediation and standardization of atom names in the PDB The second section AMBER contains the residue name resi atom name atom charge chrg and radius radi The library currently includes 2 additional fields for UHBD Brownian dynamics epsi and sigm but these are not needed for DOT and should be set to 0 0 when new residues are added to the library F 2 Additional residue libraries The directory SDOT_ROOT data cofactors contains residue libraries for DNA including 5 and 3 termini
30. arge number of solutions Parameter output_how_many_per_gridcell_best_values Saves the top N placements taken from the list of the best ranked solution at each grid point In other words if two solutions centered at the same grid point but with different orientations were highly ranked only the one with the best energy was included in the list Not available if DOT was compiled without OLDSORT option 41 October 1 2008 Chapter 6 Evaluation DOT 2 User Manual CVS Id du evalrun tex v 1 7 2008 02 14 04 35 38 vickie Exp A After DOT run projdir gt cd runs projdir runs gt ls projdir runs gt date time statmov nb of bumps rotations There will be a list of DOT results directory for each simulation run each will contain the following files runmpidot hosts statmov dot2 parm date run kill_remote_processes sh statmov log statmov topN e6d statmov top200 ace d 6 0 eval e6d statmov all e6d log B evaluate_dot_run If DOT finishes without reporting errors the rundot script automatically runs a second script evaluate_dot_run to create PDB files for an initial evaluation You can also run evaluate_dot_run later as many times as you want by giving it the name of the directory to do the evaluation in such as runs 20070927 2339 1jcg2rsa deg06 nb0 The evaluate_dot_run script provided in DOT_ROOT bin share will create new directories in the runs directory named top30pdb top30ace6 pdb and top30ace9 pdb T
31. ate the stationary molecule For our system starting with stat pdb and mov pdb to allow no penetrations or bumps by the moving molecule type rundot statmov deg06 nb0 parm h myhosts my 0 bump production run This DOT run will do 54 000 evenly spaced orientations of the moving molecule which should take a few hours on 5 to 10 processors For a finer rotational search examine your parm file To do the same rotational search but allow up to 10 atoms of the moving molecule to penetrate the stationary molecule do rundot statmov deg06 nb10 parm h myhosts my 10 bump production run Allowing penetrations bumps can be useful for unbound molecules when some structural rearrangement occurs in the complex 11 October 1 2008 Need help Email dot help sdsc edu C 3 Where DOT output goes Each invocation of rundot in projdir creates a subdirectory in projdir runs in this format projdir runs DATE TIME prefix fromparm file DOT output is put in this subdirectory For example for my 10 bump production run above started on August 2 2007 at 10 20 AM output will be in projdir runs 20070802 1020 statmov deg06 nb10 The directory name is constructed from the date and time the run was started the names of the molecules used the number of orientations applied to the moving molecule and the number of bumps allowed The last three fields are taken directly from the name of the parm file in this case statmov deg06 nb10 parm
32. ates dummy UHBD grids for software testing hostlist_to_mpichhosts script converts list of computers into form needed by MPICH used by prepscript minmax script reports minimum and maximum values in a file field by field minmaxmean script reports minimum maximum and mean values in a file field by field pdb_atom script PDB file filter that passes only atoms not headers pdb_atomhetatm script PDB file filter that passes only atoms and hetatomss not headers pdb_ca script PDB file filter that passes only C alpha atoms used by evaluate_dot_run pdb_cat_with_ter script Concatenates specified PDB files inserting a TER chain termination record in between each used by evaluate_dot_run pdb_dealtloc script PDB file filter that passes only the first of any alternate locations for an atom used by prepscript pdb_dehydrogen script PDB file filter that removes all hydrogen atoms used by prepscript pdb_dewater script PDB file filter that removes all water molecules used by prepscript pdb_make_centered script centers a PDB file by geometric bounding box used by prepscript pdb_rename_res_by_hydrogens script renames residues in a PDB file accoring to their polar hydrogen pattern used by prepscript pdb_replace_selenium script PDB file filter that renames selenium atoms to sulfurs 48 October 1 2008 Need help Email dot help sdsc edu pdb_rmsd_matrix script prints square matrix of RMSD values between all pair
33. cript finds values in an apbs dx grid used by prepscript archosv script reports what computer platform it is run on used by prepscript bgrid_to_avsfield script converts seldom used DOT binary grid for AAVS visualiazation program create_host_file script makes an MPICH p4pg file from simplified input used by prepscript create_p4pg entry script makes an MPICH p4pg entry for a specified host used by prepscript dot script currently out of date script to run DOT on supercomputers dotOmatrix script converts obsolete DOTO euler angles to rotation matrix dotmatrix script converts current DOT 2 euler angles to rotation matrix dot_pdb_e6d_eval_rmsd script computes RMSD in angstroms between a target molecule and a moving molecule as positioned by xyz euler values dotpause script puts a DOT run to sleep on the user s workstation dotresume script resumes a dotpause d DOT run dot2 prep gridsize script computes size of grid for a run used by prepscript dot2 prep mol common script computes files needed for both moving and stationary molecules used by prepscript dot2 prep potgrid apbs script computes electrostatic potential grid using APBS used by prepscript dot2 prep potgrid uhbd script computes electrostatic potential grid using UHBD used by prepscript 47 October 1 2008 Need help Email dot help sdsc edu dot2 setup bash script sets user s execution path to include DOT programs bash shell version dot2 setup
34. csh script sets user s execution path to include DOT programs C shell version e6d_append_expression script adds metadata field to e6d header block e6d first script selects first N entries from an e6d file e6d_nonsimilar script quickly eliminates similar placements from an e6d file e6d_select_by script selects entries that match a criterion from an e6d file e6d_select_by_dist_angle script quickly selects placements from an e6d file that are within a distance and angular tolerance from a specified target See also e6d_closeness and e6d_closest e6d_sort_by script sorts an e6d file by a specified field Intended to be used in a pipeline followed by e6d_first evaluate_dot_run script basic post DOT run evaluation creates PDB files of moving molecule as placed by DOT expand_environment_variables script expands VAR if VAR is an environment variable used by rundot eul_to_matrix script converts DOT euler angle to rotation matrix gen_apbs_com script generates APBS input command file from template used by prepscript gen_uhbd_com script generates UHBD input command file from template formerly used by prepscript gen_dot_parm script generates a DOT parameter file from a template in DOT_ROOT data gen_dot_prepscript script generates file prepscript customized to user s moving and stationary molecule names gen_xyzcrvs script makes stationary molecule shape description file used by prepscript gentestuhbdgrd script cre
35. d utilities Chapter 2 section A e Have your path set up to access the programs REDUCE MSMS and APBS Chapter 2 section e Set up the directory structure with starting coordinates Section E e If needed make a customized residue library for non standard protein residues and other ffunctional groups such as cofactors Section F For our example with projdir as our project directory and the starting PDB files stat pdb stationary molecule and mov pdb moving molecule both PDB files are copied into projdir coords as set up in E The script gen_dot_prepscript will create prepscript for your system Syntax is gen_dot_prepscript m movingpdb s stationarypdb d grimdim 1 residue_library 18 October 1 2008 Need help Email dot help sdsc edu An appropriate grid dimension is calculated by prepscript if 1t is not specified see section J The default library assigns atomic charges and radii and is in DOT_ROOT data uhbd amber84 prot rlb Example 1 Working in the project directory projdir with the PDB files stat pdb and mov pdb projdir coords using the default library and letting prepscript calculate the grid size cd projdir coords gen_dot_prepscript m movl pdb s stat pdb Example 2 Same coordinates but we want to use a 128 grid with 1 A spacing and the customized residue library with atomic charges myreslib rlb in projdir lib gen_dot_prepscript m mov pdb s stat pdb d 128 1 projd
36. directory then run tar xvf tar gz This will make about a dozen files including documentation and release notes the executable file is named msms platform versionnumber Copy that file to DOT_ROOT bin SARCHOS V msms For example if for our Mac we did tar xvf msms_MacOSX_2 6 1 tar the resulting files include the file msms MacOSX 2 6 1 and we put a copy in the appropriate DOT_ROOT bin directory cp msms MacOSX 2 6 1 DOT_ROOT bin i86Darwin8 msms Note that the MSMS team is currently distributing a PowerPC executable ppcDarwin8 but not an Intel native one we have tested the PowerPC msms on an Intel Macintosh i86Darwin 8 and it runs fine because Mac OS X recognizes it and invisibly runs it inside Rosetta which simulates a Power PC on an Intel So copying the same binary to both ppcDarwin8 and i86Darwin8 is OK D 3 REDUCE The Reduce program adds hydrogens to molecular models in an intelligent and flexible way needed by prepscript Reduce is from the laboratory of David and Jane Richardson at Duke University written by JMichael Word Reduce is free open source software available from http kinemage biochem duke edu software The DOT project is currently using reduce 3 13 080428 Reduce needs to know where its heteratom group dictionary is Through the courtesy of the authors of Reduce we distribute a copy of this dictionary in the DOT_ROOT data directory file reduce_wwPDB_het_dict txt D 4 APBS The A
37. e Prepscript invokes DOT utilities to create the needed MSMS surfaces determine the grid points inside and between the volumes assign values and create the stationary molecule shape potential volume file read by DOT e Spheres with center radius value that will be mapped onto grid by DOT The volume is specified by a list of spheres that DOT reads and fills in in the order they appear in the file e Excluded volume all grid pts inside molecular surface The shape potential is based on volumes inside and between molecular surfaces made with the MSMS program Two surfaces are made 1 the solvent excluded molecular or Connolly surface and 2 a molecular surface formed using atoms with their van der Waals radii expanded by 3 Angstrom As above only the nonhydrogen atoms of the coordinates are used All grid points within these two surfaces are determined those between the two surfaces are assigned a favorable value those inside the solvent excluded surface are assigned an unfavorable value The resulting list of grid points and values is the xyzcrv file read by DOT to build the shape potential grid These files typically have 50 000 to 100 000 lines e Favorable volume all grid pts between molecular surface and surface made with 3A extended atomic radii e Additional atom related properties can be added The molecular surface solvent excluded or Connolly surface is used to define the excluded volume of the molecule with all grid poi
38. e When M is centered on a grid point distant from S M may see electrostatic potential from copies of S that distorts the electrostatic energy In these cases a grid size larger than S 2M may be worth considering The computational time is linearly dependent on the number of orientations applied to the moving molecule For a given set of orientations the rotational space of the moving molecule will be more finely sampled for a smaller molecule For example for two molecules one with twice the diameter of the other the larger molecule would require 8 times the orientations to give the same orientational sampling of the smaller molecule D 2 Suitability of potentials The potentials of S are calculated once whereas the potentials of M must be recalculated for each orientation of M Since the potentials of S are calculated only once they can be quite detailed and computationally intensive The shape potential of S requires calculation of molecular surfaces and determination of the volumes of the grid that represent the excluded and favorable regions The electrostatic potential of S is calculated by Poisson Boltzmann methods which take into account ionic strength effects and the dielectric boundaries between solute and solvent Both the shape and electrostatic potential calculation for the stationary molecule take several minutes For computational feasibility the shape and electrostatic properties of the moving molecule must be rapidly calculated
39. e GNU Autoconf tools from http www gnu org software autoconf the DOT project is using version 2 61 e The GNU Automake tools from http sources redhat com automake the DOT project is using version 1 10 F 2 FFTW The FFTW fourier transform library is needed to compile DOT whether you want to run parallel or single processor FFTW is free software available from http www fftw org see http www fftw org fftw3_doc Installation on Unix html Installation on Unix The DOT project is currently using version 3 1 2 Download into DOT_ROOT fftw3 unpack as DOT_ROOT fftw3 fftw 3 1 2 For each platform ARCHOSV cd S DOT_ROOT fftw3 SARCHOSV configure for_dot2 This runs fftw 3 1 2 configure enable portable binary enable float prefix DOT_ROOT fftw3 ARCHOSV make make check optional but recommended make install F 3 Recommended directory layout Under DOT_ROOTI bin src data mpich fftw3 Under DOT_ROOT bin share and individual platform directories In DOT_ROOT bin share non compiled executable scripts DOT users need In DOT_ROOT bin ARCHOSYV compiled executable programs DOT users need Under DOT_ROOT src dot util and share In DOT_ROOT src dot dot C source files Under DOT_ROOT src dot ARCHOSV individual platform build directories In DOT_ROOT src util dot utility C and C source files Under DOT_ROOT src util ARCHOSYV individual platform build directories In DOT_ROOT src share distributi
40. e MSMS style file in DOT_ROOT data atmtypenumbers Not currently used in DOT pdb_to_xyzr script prints the x y z coordinate and radius of each atom in a PDB file The radii come from the MSMS style file in DOT_ROOT data atmtypenumbers Used by prepscript for the MSMS calculations of the stationary molecule volume pdb_trans script translates a PDB file by specified translation vector used by prepscript See also pdb_trans and pdb_rottrans pdbchecksubunit script pdbdiameter script reports largest radial dimension of a PDB file used by prepscript pdbfromdot script old version of pdbgen pdbgen script makes a moving molecule PDB file for each placement in an e6d file assigning each a name derived from its original ranking in the DOT run that made the e6d file rundot script user level script to run DOT in an automatically created new subdirectory with logging of the run including user commentary striph script PDB filter that removes non polar hydrogens retaining polar hydrogens as defined in DOT_ROOT data uhbd amber84 prot rlb unless a different library is specified uhbdgrd limit script replaces out of bounds values in a UHBD electrostatic potential grid uhbdgrd_lookup_xyz script finds values in a uhbd grd grid formerly used by prepscript uhbdlog_to_xyzq script reports atom by atom charges after UHBD program has looked them up in its library Useful for debugging AMBER style libraries 49
41. e of side chain amide orientation J Mol Biol 285 1735 1747 1999 e MSMS Michel Sanner Arthur J Olson Jean Claude Spehner Reduced Surface an efficient way to compute molecular surfaces Biopolymers 38 305 320 1996 2 October 1 2008 Chapter 2 Try DOT a Tutorial Example DOT 2 User Manual CVS Id du tutorial tex v 1 5 2007 12 08 03 59 54 mp Exp We assume DOT and its utilities have been installed see Chapter 8 for instructions For this tutorial you will also need the programs APBS Reduce and MSMS installed on your computer In this tutorial you will set up your environment for running DOT run DOT with prepared input files to test that DOT is properly located and running and prepare DOT input files to test that the DOT utilities and the programs APBS Reduce and MSMS are properly located and running A Set up user environment To find DOT you must set the DOT_ROOT environment variable to where DOT is installed For example if DOT is installed in usr local dot2 to set your program search path to include DOT programs do For csh tcsh setenv DOT_ROOT usr local dot2 source DOT_ROOT bin share dot2 setup csh If you put these two lines into your home directory cshrc file you will not have to type them again For sh bash export DOT_ROOT usr local dot2 source S DOT_ROOT bin share dot2 setup bash If you put these two lines into your home directory login or bashre file you will not have to type them again
42. e residue charge library Here we assume the use of the included charge library The residues and atoms are looked up in this library and written to an XYZQ file which contains the coordinates and charges of the moving molecule Verfication For the xyzq file moving molecule check that the xyzq file last line agrees with your calculation Check there were no error messages in the xyzq file Check the beginning of the file are the coordinates exactly the same as the corresponding PDB file For this test of prepscript I would also remove all the comment lines from the xyzq file make a temporary file check the center Gust minmax tmpfile which should be exactly the same as the centered PDB file and check that the 4th column charge adds up properly For example awk tt S4 END print t tmpfile xyzq will sum the 4th column of the file tmpfile xyzq To remove all the extra comment lines in the xyzq file using vi try 5g d This removes all lines that start with M 9 Problem with determining clamping values uhbdgrd_lookup_xyz 35 October 1 2008 Chapter 5 DOT DOT 2 User Manual CVS Id du dot tex v 1 12 2008 05 09 04 57 57 mp Exp A Sample input file The easiest way to start DOT is by invoking rundot a script we provide You always need a DOT parameter file and the four input files discussed above If you are running in parallel mode on more than one computer you need a hosts fi
43. e sum of electrostatic and van der Waals terms which are efficiently evaluated as correlation functions One molecule S is kept stationary and a second molecule M is moved about S The electrostatic and shape properties of both molecules are mapped onto equal sized grids M is translated by being centered at each grid point of S interaction energies are calculated M is rotated and the very efficient translational search is repeated The calculation is dependent on the size of the grid and the number of orientations applied to M The output is a ranked list of placements of M about S from which coordinate files in PDB format can be generated The resulting configurations can be analyzed visually with computer graphics filtered by biochemical or spectroscopic data analyzed to find clustering subjected to methods that introduce flexibility An important consideration is the size of the grid The grid representing S is repeated in all directions periodic boundary conditions Therefore the grid must be large enough that M does not see adjacent copies of S or their properties The significantly enhanced new version of the DOT software package provides the following e Automated setup of DOT input files starting with protein coordinate files from the PDB e Improvements in molecular potentials that have been described in literature are now part of the automated setup e Error checking during setup of input files to detect potential problems before t
44. e tests will only find the most common errors in the pdb files M 1 Not able to find REDUCE APBS MSMS M 2 Error reported by REDUCE M 3 Problems removing nonpolar H atoms striph M 4 Molecular description files not made correctly most common problem is file is empty M 5 Non integral molecular charge M 6 Large positive or negative 20 molecular charge M 7 Grid dimensions could not be calculated or too big Since prepscript centers the coordinates with added polar H atoms and places the center of mass at the nearest grid point the midpoint of these centered files should be off by no more than 1A from 0 0 0 The files without polar H atoms will vary slightly from 0 0 0 Note that 1f you end up recentering say discover later that you were missing some atoms add them and recenter ALL of the DOT input files for those recentered coordinates must be remade It is very important that the moving molecule is quite close to centered How to check the midpoints 34 October 1 2008 Need help Email dot help sdsc edu pdb _to _xyz lt centered _pdbfile gt minmax gt pdbfile minmax Compare the first few lines of the centered PDB files with and without polar H atoms The x y z coordinates of the heavy atoms should be exactly the same M S APBS not running properly 8a PDB to XYZQ Typically programs that are used to calculate the electrostatic potential around a molecule utilize a PDB file for the molecule and and appropriat
45. e van der Waals regardless of previous value S for sum replace with sum of previous value and specified value does not replace forbidden region 3 The radius of the sphere in Angstroms If smaller than the grid spacing only the single nearest grid point will be filled in 4 An optional value can be used Normally this is omitted because it is determined by the Action Code see dotio c for details and dotio h for the actual numeric values Currently these are 1 for attractive A and 1000 for forbidden P Example xyzcrv file fragment 24 1 13 A 0 1 24 2 4 A 0 1 24 2 3 A 0 1 24 202 A Ol 24 2 5 A 0 1 Wh 202 E0 E 2 A 10 17 20 F 0 1702 OL IT 0 E Ova In this typical case the spheres all have a radius of 0 1 therefore for the default grid spacing of 1 include only a single grid point These are the forbidden grid points that were determined to lie inside the solvent excluded surface and the attractive grid points that were determined to lie between the solvent excluded and expanded surfaces The xyzcrv file typically has 50 000 to 100 000 lines which are all the grid points assigned as forbidden or attractive The grid points that are not assigned an Action Code or value in the xyzcrv file have the value 0 The construction of the xyzcrv file makes it easy to add additional atom related properties Example if it is known that a region of the stationary molecule is not available for the inco
46. ee ee ee oe ee ec peed oe ee A aa 9 B 6 Most likely problems 22 e a E ee ee 10 C Run DOT iiun aa BG 4h TA er AA ee as BES ae A fa 10 C 1 Running DOT ess abr a aot wp chy Pg ee A eee hd ed ne bs ke a A e 10 C 2 Running DOT on multiple computer processors o ee 11 C 3 Where DOT output S068 sie ae a toe A AA ee a Ba a Oe E 12 C4 DOT output and evaluation ee 12 C 5 Testing DOT on multiple computers processors 2 2 ee 12 4 Prepscript creating input files for DOT 13 A Outline of setup steps coe aa e e e i a e a a a a E e e E 13 B Accessing DOT scripts and utilities and auxilliary programs soaa a 14 C Examine and complete starting PDB files aaua aaa a 15 C 1 Missing atoms in side chains ee 15 C 2 Missing loops and chain ends aoaaa ee 15 D Assigning the stationary and moving molecules ooa oaa aa e 15 D 1 Computation Umes srei aaran A ee eae ae AA A a 16 D 2 Suitability of potentials ee 16 D 3 Other dai cai ER A ee ee aoa oe ee ees 16 EA DOT VAS Need help Email dot help sdsc edu Set up directory structure with starting coordinates 2 0 02 00 02 eee eee 16 Set up customized library for new functional groups 2 ee ee 17 F l Standard residue library for proteins e 17 F2 Additional residue libraries oos 2 ee aA 18 Create Prepscripts sega sok WR ee Bo ie SE Vege a ee o 18 Prepscripti
47. es for DOT run parm B Accessing DOT scripts and utilities and auxilliary programs It is assumed that you have set up your user environment to access e DOT scripts and utilities see Chapter 2 section A e the programs APBS Reduce and MSMS see Chapter 2 section D 1 Note the DOT distribution does not include MSMS If you successfully ran the tutorials Chapter 2 your user environment includes the environment variable DOT_ROOT and the following directories are in your path Platform independent bash and csh scripts are in SDOT_ROOT bin share Compiled C C utilities are in DOT_ROOT bin SARCHOSV where ARCHOSV is replaced by your platform operating system name that is determined by the script DOT_ROOT bin share archosv If you are using the APBS and Reduce programs distributed with DOT they are in DOT_ROOT bin SARCHOSV If you followed the installation instructions for MSMS Chapter 8 Section it will also be in DOT_ROOT bin SARCHOSV The directory DOT_ROOT data includes residue atomic charge libraries rlb rotation sets that are applied to the moving molecule eul auxilliary files for MSMS calculation of molecular surfaces and ACE for evaluation and sample files DOT 2 User Manual CVS Id du prep repairpdb tex v 1 8 2007 11 21 22 19 41 mp Exp 14 October 1 2008 Need help Email dot help sdsc edu C Examine and complete starting PDB files Prepscript the script that creates the DOT
48. f the side chain and rename the residue ALA In this case the advantage of building the full side chain is that the correct total charge is retained The disadvantage is that the side chain was probably disordered and may have multiple similar energy conformations Generally we build the full side chain Building side chains is outside the scope of the DOT program suite but proprietary programs such as Insight II Accelrys and open source tools such as SWISS PROT can be used to do this task Since residues with disordered atoms are usually on the surface of the protein they can affect the shape and electrostatic properties of interaction surfaces and hence should be built carefully For example added atoms should be checked for steric clashes with other atoms in the molecule If the missing heavy atoms are not added prepscript will find that the protein s total charge is not an integer report the error and stop C 2 Missing loops and chain ends Missing residues are often indicated in the PDB file as REMARK 465 MISSING RESIDUES In general we do not build in missing residues for rigid body docking The user may decide that missing regions need to be built to make a complete model but these regions usually are missing because they are disordered and probably have multiple conformations If the missing loop or N or C termini regions are not built there are two considerations First the resulting termini should be uncharged so t
49. general zastora anh ann Wa BA a ete e et ac bee Ee AE Mate Se 19 Hel General Approach aras arto aa a a GA a ha EM ae ER Ba 19 H 2 Running pr epsctipt 3 sise exp aig ee eee we Boe ae de ee ee a 19 H 3 Assignment of filenames ee 20 H 4 The pause command pausing during creation of DOT input files 21 Prepscript Steps common to both molecules dot2 prep mol common 21 1 1 Check on range for total molecular charge o oo e 22 12 Creating no hydrogen files noh files o ee 22 13 Centering molecules ee 23 1 4 Protonation with REDUCE allh files 2 2 2 o 23 I 5 Create files with heavy and polar H atoms polh and RES names for charge library 27 1 6 Moving molecule shape xyz 2 ee 28 17 Moving molecule partial atomic charges xyzq o e 28 1 8 Creating new functional groups 2 ee 29 Determining grid SIZE ses cy 25h a e Oe Be ee ae a ee 29 J 1 Default method based on molecular diameters o o e 30 J 2 Sizes efficient for the calculation o oo 30 J 3 User selection of grid dimension 0 0 00002 eee ee eee 30 J 4 Scripts utilities sed a ah al a a eh a ae ae ak a 30 Prepscript Stationary molecule 2 2 ee 30 K 1 Calculate Electrostatic Potential for Stationary Molecule dx o 31 K 2 Stationary Shape De
50. ghtly more favorable than protonation on ND1 Some molecular modeling programs such as Insight Accelrys use this as the default protonation state at pH 7 So far Reduce s method of basing the protonation state on the local environment has given the best docking results with DOT Cys can be both free protonated on SG or form a disulfide no hydrogen atom REDUCE determines the protonation state based on local environment Reduce also recognizes when Cys is ligated to a metal ion and adds hydrogen atoms appropriately 4c Reduce geometry check HIS ASN and GLN side chains At the resolution of crystallographic protein structures the two possible orientations of the His imidazole ring and the side chain amides of Asn and Gln cannot be distinguished Reduce checks the hydrogen bonding environment of the crystallographic orientation and the flipped position 180 degree rotation and scores which one is best Reduce then creates the flipped coordinates if needed and adds hydrogen atoms In the His imidazole ring the positions of CD2 and ND1 are exchanged and the positions of CE1 and NE2 are exchanged in the flipped conformation For Asn and Gln the N and O atoms of the side chain amide are exchanged In the Reduce mode used by prepscript the side chains are flipped only when the environment indicates a clear preference for the flipped orientation We have found that the defaults used by Reduce work well The user may be interested
51. hat spurious charged groups which can significantly affect the overall charge distribution are not introduced One way the user can approach this is to leave the termini as standard amino acids This results in incomplete covalent bonding for the terminal main chain atoms N at the N terminus and C at the C terminus which is not a problem for the computational docking Alternatively the user could use a molecular modeling program to build acetyl ACE groups onto N termini or N methyl groups NME onto C termini also leaving both ends neutral Second missing regions especially those at the N and C termini can indicate surface regions where an incoming molecule is unlikely to bind Docked configurations found by DOT that contact these incomplete termini may be eliminated as likely complexes DOT 2 User Manual CVS SId du prep statmov tex v 1 4 2007 10 30 06 06 19 vickie Exp D Assigning the stationary and moving molecules In the DOT docking calculation one molecule is stationary and the other molecule is moved about it There are several criteria for selecting which molecule will be stationary and which will be moved 1 The computational time required for a translational and orientational search of a given fineness The computational time is dependent on the size of the grid and the number of orientations applied to the moving molecule 2 Suitability or convenience of the potentials used to describe the molecules since the stationary and mo
52. he docking calculation is run e Faster DOT now runs 33 faster e Portability will run on Linux Mac OS X and Solaris e Reevaluation of top ranked DOT protein protein complexes with ACE pairwise atomic contact energy which takes into account desolvation energy B Key DOT Chapters e Chapter 8 DOT Installation e Chapter 2 DOT Tutorial try out first e Chapter 3 Docking your own system the basics e For help Send email to the DOT help line dot help sdsc edu or post to the DOT Users Fo rum dot users sdsc edu You must be a subscriber to the mailing list to post Please see http lists sdsc edu mailman listinfo dot users to subscribe to the list Need help Email dot help sdsc edu C Acknowledgements We thank the Department of Energy DOE the National Science Foundation NSF and the National Institutes of Health NIH for funding DOT setup depends upon the programs APBS Reduce and MSMS We appreciate the work of the authors of these programs and strongly encourage you to acknowledge their efforts should you publish work aided by DOT The appropriate acknowledgements are e APBS Baker NA Sept D Joseph S Holst MJ McCammon JA Electrostatics of nanosystems application to microtubules and the ribosome Proc Natl Acad Sci USA 98 10037 10041 2001 e Reduce J Michael Word Simon C Lovell Jane S Richardson David C Richardson Asparagine and Glutamine Using hydrogen atom contacts in the choic
53. he top 30 raw DOT moving molecules will be in top30pdb named mov 0001 pdb mov 0002 pdb etc where mov is the name you gave to your moving molecule These are the top 30 as reported by DOT s built in electrostatic quick van der Waals term The evaluate_dot_run will also score the top 2000 DOT placements using the sum of the DOT electrostatic energy and an empirical ACE term based on atom types with both a 6 Angstrom and 9 Angstrom cutoff The ACE rescored top 30 moving molecules will be in top30aceDDD pdb named mov NNNN pdb where NNNN is their original DOT ranking and DDD is the cutoff distance 6 or 9 here The evaluate_dot_run script also makes in these two directories a combo file containing the C alpha backbone atoms only separated by TER records named top30 ca pdb and top30ace ca pdb The 6 Angstrom cutoff might be better for evaluating dockings of bound structures or those with little conformational change and the 9 Angstrom cutoff better for systems showing more change upon binding You must examine and compare these against the centered stationary molecule not the original one you supplied to prepscript You will find this file in your coords stat stat cen noh pdb where stat is the name you gave to your 42 Need help Email dot help sdsc edu stationary molecule The rundot script also makes a copy of it by that name in the runs directory at the conclusion of each successful DOT run You eventuall
54. http www mpich org needed for multi processor parallel runs 2 the Fast Fourier Transform library FFTW version 3 1 2 http www fftw org needed for all DOT runs and two of the three external programs required to generate DOT input 1 APBS 1 0 0 21 April 2008 from http sourceforge net projects apbs supplied with the DOT distribution 2 Reduce version 3 13 080428 from http kinemage biochem duke edu software supplied with the DOT distribution 3 MSMS download from http mgltools scripps edu downloads NOT in the DOT distribution see below DOT and DOT utilities are in DOT_ROOT bin The bin subdirectory contains a share subdirectory which contains platform independent shell scripts The bin subdirectory also contains platform specific subdirectories which each contain compiled DOT DOT utilities APBS and Reduce Executables are available for the following four platforms with the indicated subdirectory names 1 Intel Mac OS X subdirectory i86Darwin8 2 PowerPC Mac OS X subdirectory ppcDarwin8 3 Intel Linux subdirectory i86Linux2 4 Sun SPARC Solaris 8 subdirectory sun4SunOS5 The directory names are generated by the script gt DOT_ROOT bin share archosv This script is run when you run the script to create DOT input or the script to run DOT It queries the computer you are on and returns a string ARCHOSV accordingly so that the correct executable is found for your machine architecture and operating
55. in using Reduce interactively through the Molprobity web site 26 October 1 2008 Need help Email dot help sdsc edu http molprobity biochem duke edu where the user can view the two orientations in the protein environment and select which should be used 4d Reduce geometry check Steric problems Reduce uses the hydrogen atoms that it adds to check for steric problems and applies small motions to the side chains to relieve them 4e Residues other than standard amino acids These groups include functionalized amino acids cofactors DNA residues metal ions etc REDUCE is likely to do reasonable job especially the CONECT records from the PDB file are retained We have tested Reduce on HEM and DNA residues although with the recent changes in the PDB the names of residues in DNA and RNA is a mess Reduce has a utility to add new functional groups see Reduce documentation The user may want to try adding H atoms first to new functional groups and then make a customized version of dot2 prep mol common in which the call to pdb_dehydrogen which removes hydrogen atoms in the starting PDB files is commented out DOT 2 User Manual CVS Id du prep polh tex v 1 7 2008 04 02 03 52 04 vickie Exp I 5 Create files with heavy and polar H atoms polh and RES names for charge library The script dot2 prep mol common called by prepscript takes the centered PDB files with all hydrogen atoms created by REDUCE allh files and selects al
56. ir lib myreslib rlb Note the the full pathname should be specified for the library A customized library can be built by adding your new functional groups to a copy of the default library For example our researcher copied the default protein residue library DOT_ROOT data uhbd amber84 prot rlb into the directory projdir lib and named this residue library file myreslib rlb The library assigns partial atomic charges and atomic radii to each atom of a residue To add a new residue to the library the user should assign radii consistent with those in the default library and obtain reasonable partial atomic charges consistent with those of the AMBER polar hydrogen force field following the format of the library DOT 2 User Manual CVS Id du prep general tex v 1 5 2008 04 02 03 52 04 vickie Exp H Prepscript general H 1 General Approach Prepscript will work without intervention if the two molecules are proteins with standard residues contain no cofactors not in the current charge library and both start with residue number 1 which should be treated as a positively charged N terminus If intervention is needed the PAUSE command can be inserted with a comment to remind the user that a file needs to be adjusted before proceeding It is important to check the log files for errors They will be located in the stat and mov directories where again stat and mov are example names used in this manual normally you wil
57. kie Exp E Set up directory structure with starting coordinates First create a main directory for your project then create a subdirectory coords For example if we decide the main project directory will be named projdir we move to the directory where projdir is to be created and type 16 October 1 2008 Need help Email dot help sdsc edu mkdir projdir cd projdir mkdir coords This makes the directory structure projdir coords Put the prepared PDB files of the moving and stationary molecules projdir coords For example if our starting PDB files are stat pdb stationary molecule and mov pdb moving molecule we have projdir coords stat pdb projdir coords mov pdb DOT 2 User Manual CVS Id du prep library tex v 1 1 2008 05 09 05 29 49 mp Exp F Set up customized library for new functional groups F 1 Standard residue library for proteins The default residue library is DOT_ROOT data uhbd amber84 prot rlb This residue library contains partial atom charges and radii for e the 20 amino acid residues e the 20 amino acid residues as N terminal residues of a peptide chain e the 20 amino acid residues as C terminal residues of a peptide chain e HIS protonated on NE2 alone and ND1 alone neutral and protonated on both e CYS both free and as part of a disulfide e ACE acetyl group that sometimes begins a peptide chain e NME methyl amino group that sometimes ends a peptide chain For a list
58. l DOT in your home directory or if you have root access to your machine you may install DOT in usr local bin or an equally appropriate location We recommend you unpack the distribution in a directory that is mounted on all computer platforms on which you expect your users to run DOT such as a globally exported directory on an NSF file server If you do this your users will be able to run DOT on the platforms included as binaries in our distribution with very little more installation After unpacking the distribution you will need to set the DOT_ROOT environment variable to the directory of the unpacked distribution In the following example we install DOT into a directory off our home directory and rename it dot2 0 tar xfz DOT2 0_beta0 tar Z mv DOT2 0_beta0 dot2 0 export DOT_ROOT HOMEdot2 0 cd DOT_ROOT Please note that if you are going to compile from source see the configure make instructions at the end of this chapter In any case you will still need to download the MSMS program and if you want to run DOT in parallel on multiple CPUs to configure and make MPICH see below for instructions C What is in the DOT distribution The distribution contains the DOT executables for both single and multi processor runs shell scripts necessary to generate DOT input and analyze DOT output two pre compiled external libraries 51 Need help Email dot help sdsc edu 1 the Message Passing Interface library MPICH version 1 2 7p1
59. l have called your stationary and moving molecules names like labc pdb and 5gfd chainB pdb H 2 Running prepscript For prepscript to work you need to e Have your path set up to access DOT scripts and utilities Chapter 2 section A e Have your path set up to access the programs REDUCE MSMS and APBS Chapter 2 section e Set up the directory structure with starting coordinates Section E Prepscript is run in the coords subdirectory of your project We highly recommend you make a log file when running prepscript To run prepscript type 19 October 1 2008 Need help Email dot help sdsc edu For csh prepscript amp tee prepscript log For bash prepscript 2 gt amp 1 tee prepscript log Example Working in the project directory projdir using csh cd projdir coords prepscript amp tee prepscript log H 3 Assignment of file names Prepscript automatically adds to the file name of the original input PDB files to indicate what is in the file and the file type Systematic names include cen centered noh heavy atoms with no hydrogen atoms polh heavy atoms with only polar hydrogen atoms allh heavy atoms with all hydrogen atoms Suffixes that indicate file types include xyz Contains X Y and Z coordinates only xyzq Contains X Y and Z coordinates and partial atomic charge xyzqr Contains X Y and Z coordinates partial atomic charge and radius xyzqr xml as
60. l the heavy atoms plus the polar hydrogen atoms polh files thereby removing all of the nonpolar hydrogen atoms Polar hydrogen atoms are those attached to N O and S For common variants of standard amino acids unique residue names are assigned based on the protonation pattern Currently prepscript can figure out common variants of His and Cys residues and whether an amino acid is at the N or C terminus of a peptide chain For a starting molecule named mol pdb the input file is the centered PDB file with all hydrogen atoms mol cen allh pdb and the file created is the centered PDB file with polar hydrogen atoms and adjusted residue names mol cen polh pdb Why The current DOT energy terms and charge libraries are balanced for using atomic charge sets based on heavy atoms plus polar hydrogen atoms Therefore nonpolar hydrogen atoms must be stripped from the coordinate files Since each variant of a residue has 1ts own charge distribution each must have a unique residue name that corresponds to its entry in the charge library so that partial atomic charges can be correctly assigned 5a His Cys and N terminal amino acid residues Prepscript automatically assigns unique residue names to common variants of His and Cys and to charged amino acid residues at the N or C terminus of a peptide chain For His the variants are recognized by the protonation patterns of the imidazole ring HIS or HIE Protonated on NE2 HID Protonated on ND1
61. le see later in this section rundot will examine the parameter file and use it to create a new directory folder for each run named according to the date time and parameters used These folders will be placed in the project s runs folder which will be created if it does not itself already exist rundot options B Most likely to change parameters no of orientations number of allowed bumps DOT s functions are controlled from a parameter file which DOT reads at the beginning of a run You will likely use several parameter files during a research project with different numbers of allowed bumps atom atom clashes fewer or more rotations to try and saving different numbers of results In general start out with no bumps just a few hundred rotations or just one rotation and save only the best 2000 results After you have examined the output of the initial shake down run increase the number of rotations smaller degree spacing implies more rotations and save perhaps the best 20000 or more results Small moving molecules need fewer rotations to sample adequately their orientation large molecules need more Table shows the rotation files provided in the DOT distribution As a rule of thumb choose a rotation step that is about the same as the 3 D diagonal of your grid which is v3 x gridspacing For example if your moving molecule has a radius of 15 angstroms 1 angstrom grid spacing 1 7 angstrom 3 D diagonal would subte
62. lization input files 2 2 o e e 45 7 DOT Utility Programs 46 A CH ON rs oes tae deg Be as yh ed NS a eee ath kT eed As hit hed ae ah oan eae ea eg 46 8 DOT Installation 51 A Introductionis sa tie ice eS dane Peto dake See A Bee Ree BS We a A ae Be 51 B DOT Installation Quick Start Guide kaa ee 51 C Whatisinthe DOT distribution 0 2 0 0 2 000020 000020000220 00 a 51 D External programs needed to create DOT input files o oo e 52 D 1 General utilities shell awk gawk 2 0 020 000 0000 e 52 D2 MSMS 2 a5 A ann SM St Gale Oe A eee EE AS 53 D3 REDUCE 2 cod ti we ahhh i tddi ai 53 DA APBS seen eg hit yg A AE eee AA EA BEE Ae RE 53 E External libraries needed to run DOT 0 00000 000 2 eee eee eee 54 E 1 MPICH 3 ean stags a he teases Pi Deg Pk Baek ad dence be Bek 54 F What else you need to be able to recompile DOT programs o o e e 55 Fl GNU Autoconf Automake 2 2 ee eee 55 F2 EE EW sn acc Segre auton tas dele ees re eats uk Bo eee teak AA Gunes Weck ge 55 F3 Recommended directory layout 2 20 0 e 55 iv October 1 2008 Chapter 1 Introduction DOT 2 User Manual CVS Id du intro tex v 1 8 2008 10 02 01 57 02 mp Exp A What DOT does DOT is a macromolecular docking program that carries out a complete systematic rigid body search for two molecules Intermolecular interaction energies are calculated as th
63. ll read the xyz file and then map each coordinate of the moving molecule heavy atoms only onto the nearest grid point 1 7 Moving molecule partial atomic charges xyzq The script dot2 prep mol common called by prepscript creates the xyzq file which contains all heavy atoms and polar hydrogen atoms and their partial atomic charges This DOT input file defines the charge distribution of the moving molecule In the xyzq file each line has fields X Y Z in Angstroms and partial atomic charge Example xyzq file fragment 24 004 10 540 13 354 0 9 14 847 1 450 26 429 0 8 23 264 49 230 39 562 1 2 For example for our starting moving molecule coordinates mov pdb the centered xyzq file is mov cen polh xyzq which has the same number of atoms as mov cen polh pdb This file is created in dot2 prep mol common pdb_to_xyzq mov cen polh pdb gt mov cen polh xyzq 28 October 1 2008 Need help Email dot help sdsc edu DOT will read the xyzq file and then interpolate the partial atomic charges of the moving molecule heavy and polar hydrogen atoms over the eight nearest grid points I 8 Creating new functional groups For new functional groups the user will have to edit the output polh file to assign new residue names that identify the functional group uniquely To do this the user must 1 Make a customized atomic charge library that contains the new functional group 2 Make a customized version of dot2 prep
64. me Example 1 I want to check the DOT input files for the moving molecule before I move on to processing the stationary molecule I put a pause statement in prepscript after the moving molecule is processed dot2 prep mol common p mov pdb 1 reslib w moving exit_if_error dot2 prep mol common reported an error prepscript quitting popd echo Moving molecule files are complete pause check moving molecule files mov cen noh xyz and mov cen polh xyzq Example 2 I have a CYS residue that is bound to a Cu atom in plastocyanin my moving molecule Since there is no hydrogen atom on the S atom it will be automatically assigned as CYX a neutral residue that is part of a disulfide However I have created a residue CYM that has the appropriate charge of 1 needed for CYS as a metal ligand To make sure the residue gets assigned properly I have to intervene after all hydrogen atoms have been added by REDUCE but before atomic charges are assigned This requires making a customized version of dot2 prep mol common for my molecule mov pdb After all hydrogen atoms are added with REDUCE but before non polar hydrogens are removed I put pause edit mov cen allh pdb to replace Cu bound CYS 84 with CYM I edit the file and then type enter or return to continue the processing of DOT input files DOT 2 User Manual CVS SId du prep steps tex v 1 8 2008 04 02 03 52 04 vickie Exp I Prepscript Steps common to both molecules
65. min to equal 30 will allow prepscript to run without an error Within prepscript the total molecular charge is checked for each molecule by the script dot2 prep mol common and checked in the log file made by either APBS or UHBD when the electrostatic potential for the stationary molecule is calculated If the the total charge for both the stationary and the moving molecule is in the range defined by qtotmin and qtotmax but is not an integer prepscript will quit with an error message This turns out to be an excellent check of whether atomic charges got assigned correctly Residues or associated residues have integral charges in the residue library A non integral charge usually indicates missing atoms either because atoms are missing in the original PDB files or because hydrogen atoms were not added correctly DOT 2 User Manual CVS SId du prep noh tex v 1 4 2008 04 02 03 52 04 vickie Exp 12 Creating no hydrogen files noh files The script dot2 prep mol common called by prepscript removes water molecules hydrogen atoms and alternate positions for the same atom from the input PDB file For a starting molecule named mol pdb the file created is e mol noh pdb in PDB format Why Input PDB files can include small molecules that would not be considered to be part of molecular shape presented to docking partners These molecules include water molecules ions from buffer and some metal ions Prepscript removes wa
66. ming molecule the automatically assigned attractive layer in this region can be overwritten by e Selecting the atoms or secondary structure elements that represent this surface e Creating spheres of forbidden area centered at these atoms e Appending these spheres to the xyzcrv file created by prepscript DOT 2 User Manual CVS Id du prep clamping tex v 1 2 2007 08 16 06 30 11 mp Exp 33 October 1 2008 Need help Email dot help sdsc edu K 3 Electrostatic Clamping Values The electrostatic clamping values are calculated by prepscript and inserted into the parameter file it generates Prepscript instructs DOT to perform a technique known as electrostatic clamping the interested reader may consult Roberts et al for a thorough explanation of the rationale We use the van der Waals volume file for the steric part of the docking problem However there are well documented problems with using the electrostatic potential at the van der Waals surface We also define a solvent accessible surface which is an expansion of the first based on a solvent radius of 1 4 Angstroms defined in genxyzcrvs We will calculate the electrostatic potential at the solvent excluded surface The maximum range at this surface are then used as the clamp limits fo the potential at the van der Waals surface This has the effect of smearing out unrealistically concentrated charge at the surface DOT 2 User Manual CVS Id du prep parmfile tex v 1 4 2008 05 09 04 57 5
67. mol common that includes a pause command 3 Edit prepscript to call the customized dot2 prep mol common add the pause command section to the prepscript file so that prepscript pauses while the user edits the polh file For example a PDB file contains a Cys that is a Cu ligand The CYS residue name is in the original PDB file and Reduce has CYS in its library and correctly protonates the residue Reduce correctly adds no proton to the CYS SG because Reduce identifies the SG Cu bond Prepscript automatically identifies this CYS as CYX due to the protonation pattern However the user knows that CYS as a Cu ligand should have an overall charge of 1 rather than being neutral as it is in a disulfide The user creates a new residue entry in the charge library CYM with appropriate partial atomic charges The user then edits the output polh file to change the CYX residue to CYM so that the correct atomic charges are assigned 8a Other functional groups If there are other functional groups or molecules such as cofactors present in the PDB file the user should e Check to see if this group is in the charge library e Check that the atom and residue names in the library and the PDB file match e If needed create new entry in the charge library with a unique RES name and partial charges and atomic radii e Check the total charge on the residue is as expected e Check that Reduce adds hydrogen atoms appropriately e Check that the correc
68. n elec pot of stationary mol Elec pot grid is created by APBS Charge library used for both S and M has charges for heavy atoms plus polar H model Library currently has amino acids plus charged N and C termini for all Library currently has DNA nucleic acids plus 5 and 3 termini for all APBS command file generated by prepscript G DOT parameters dot2 parm DOT parameter file DOT s functions are controlled from a parameter file which DOT reads at the beginning of a run 39 October 1 2008 Need help Email dot help sdsc edu G 1 Sample Parameter files G 2 Parameter Reference Table Dot 2 parameters Id dot2 parameters tex v1 42007 12 0701 17 03mpEzxp Commonly used and required parameters are in bold parameter type default notes dot_version string none Parameter file format identifier ignored fftw_plan string patient Experts only also estimate measure do_logNmerge boolean false Use for massively parallel runs BlueGene fussy boolean true Halt if likely mistakes are detected dot_grid_size int 128 X Y Z size of the grid in grid points sizex int 128 X size of the grid in grid points sizey int 128 Y size of the grid in grid points sizez int 128 Z size of the grid in grid points dot_grid_step float 1 Angstroms stat_pot_file filename APBS UHBD or DelPhi grid file stat_vdw_file filename xyzcrv shape file stat_pdb file filename pdb file passed to evaluate_dot_run mov
69. nciple you can run DOT using MPICH without recompiling the MPI libraries but we do not know a simple way to install only the user level MPI scripts so here is how to install all necessary MPI routines for running on a network of local workstations such as Intel Linux PPC or Intel Macintosh or SPARC Solaris To run MPI programs on your local network your users must be able to use either ssh or else rsh without typing passwords this can be thorny if security is a severe concern see http www unix mcs anl gov mpi mpich1 docs mpichman chp4 node127 htm Node127 and http www unix mcs anl gov mpi mpich1 docs mpichman chp4 node128 htm Node 128 We found at TSRI that the MPICH installation configure step also appears to check for the abilty of you the installer to run ssh without a password and configurations done by people who could not do this were not usable by people who could so our advice is that installers should make sure they can ssh localhost without a password The DOT project currently uses MPICH 1 version 1 2 7p1 DOT does not use MPICH 2 because currently MPICH 2 does not let users mix different kinds of computers in a single parallel run important in a typical heterogeneous workstation network MPICH is available from http www unix mcs ans gov mpi mpich1 download html about 16 MB Download into DOT_ROOT mpich unpack as DOT_ROOT mpich mpich 1 2 7p1 Configure build and install for each platform SARCHOSV
70. nd 1 7 60 degrees per radian 15 6 degrees so the deg06 eul file would be appropriate Table Rotation files included in DOT distribution Spacing Numberof File name degrees Orientations in DOT_ROOT data 4 232020 deg04 eul 4 12869 deg04 near180 eul Rotations in range 175 to 180 degrees 6 54000 deg06 eul 8 27000 deg08 eul 10 14400 degl10 eul 12 9000 deg12 eul 20 1800 deg20 eul 72 60 deg72 eul 90 24 deg90 eul 360 1 zero rotation eul 36 Need help Email dot help sdsc edu Itis best to make your parameter files with names meaningful to you we find a pattern like udgugi deg06 nb10 parm to work well indicating that the stationary molecule is udg the moving molecule is ugi the rotation spacing is 6 degrees and the number of allowed bumps is 10 Note that DOT ignores the file name and just looks at the content however We find it easiest to make changes by copying to a new name and editing the copy For example prepscript s parm files save the 2000 best ranked placements To save 20000 of them copy the sample parameter file that prepscript will have made for you as statmov deg06 nb0 parm to a name such as statmov deg06 nb0 best20000 parm Edit that copy and change the line output_how_many_best_values 2000 to output_how_many_best_values 20000 If your moving molecule is small you might not need as fine a rotation spacing as the 6 degrees that prepscript s parm files use Suppose you decide 10
71. nergy lt this gt Note Dot accepts true yes on or enabled to set booleans true Dot interprets anything else as false Note Dot source code generally uses variable names different from these parms 40 October 1 2008 Need help Email dot help sdsc edu G 3 Parameter Guide Parameter mov_atoms_in_stat_interior_limit Sets the maximum number of moving molecule atoms allowed to penetrate the stationary molecule s excluded volume Informally called the number of bumps Parameter vdw_weight Sets the weighting in the composite energy of the van der Waals count number of moving molecule atoms inside the stationary molecule s favorable layer The default is 0 1 so that each moving molecule atom in the favorable layer contributes 0 1 kcal mol to the van der Waals energy We have found that this default provides a good balance with the electrostatic energy term Parameter output_how_many_best_values Saves the top N solutions including multiple solutions centered at the same grid point with different orientations This better reveals clusters that may indicate correct solutions Not available if DOT was compiled without HEAP option Parameter output_all_Ethreshold Saves all solutions including multiple solutions centered at the same grid point with different orientations with energies more favorable than the specified threshold Needs to be used carefully to avoid outputting a very l
72. not properly protonated This check is a KEY check that you are building your system properly Residues other than standard amino acids Appropriate charges are also available for HEM and DNA residues the libraries can be customized to handle new residues see section 8a page 29 The program Reduce can also handle DNA residues and may also add hydrogen atoms properly to unusual functional groups or modified amino acid residues see section 4e page 27 N terminal residues of a protein If the N terminal residue does not have a residue ID of 1 see section 4a page 24 You must decide if you want it neutral beginning with a standard residue which has a single hydrogen atom on the N atom or positively charged The most likely problem is a missing hydrogen atom at the beginning of a peptide chain when you intended the terminus to be neutral If the total charge is off by 0 248 suspect a missing main chain amide hydrogen atom Prepscript checks that may not apply to your particular system One example where checks done in prepscript may not be appropriate for your system is if the total charge on one molecule is very large which is quite likely for DNA see section M page 34 C Run DOT Goal To run DOT and examine DOT output C 1 Running DOT To run the program DOT first go to the main directory of your project For our example projdir cd projdir The sample DOT parameter files parm made by prepscript are in this directory
73. nts inside this volume defined as forbidden F An expanded molecular surface is made by adding 3 A to the radius of each atom Grid points inside this surface but outside the solvent excluded surface are defined as attractive A NOTE All surfaces are made using the heavy atoms only no hydrogen atom positions are included in the calculation Example For our starting stationary molecule stat pdb the input file for the molecular surfaces is stat cen noh pdb The call to generate the xyzcrv file in prepscript is 32 October 1 2008 Need help Email dot help sdsc edu gen_xyzcrvs ccp cen noh pdb xyzdim xyzdim xyzdim xyzstep 2 gt 1l tee gen_xyzcrvs log where xyzdim are the x y and z dimensions of the grid and xyzstep is the grid spacing prepscript assumes a default of 1 A The resulting xyzcrv file in our example stat cen noh xyzcrv is a free format text file each line of which has fields X Y Z Action Code radius and optional value to be filled into the sphere with that center and radius 1 X Y and Z are the coordinates for the sphere center in Angstroms in the centered stationary molecule s coordinate system 2 The Action code is one letter e F for forbidden the excluded interior volume replaces any previous values in the sphere s volume e A for attractive favorable van der Waals does not replace forbidden region e R for replace with attractive favorabl
74. on called by prepscript uses the program REDUCE to add hydrogen atoms both polar and nonpolar to the already centered coordinates Within prepscript given a starting PDB file for example mol pdb the PDB file created by Reduce will be named mol cen allh pdb the centered PDB file with all hydrogen atoms When Reduce protonates the macromolecule it determines a reasonable protonation state for His and Cys side chains from the local environment For protonation 3 problems need to be considered 1 The appropriate state of the N terminal residue of a peptide chain 2 Protonation state of residues like His and Cys 3 Protonation state of unusual functional groups Reduce automatically assigns N termini as positively charged IF the residue has residue number 1 Other cases require special consideration see below INCORRECT PROTONATION OF N TERMINI is the most likely reason for a nonintegral charge on a protein when the atomic point charges are assigned see Section 4a For a true N terminal residue the N terminus should be an amino group NH3 consisting of atoms N H1 H2 and H3 that contributes an overall charge of 1 to the residue For a true C terminal residue the C terminus should be a carboxylate group CO consisting of atoms C O and OXT that contributes an overall charge of 1 to the residue The termini of the coordinates often do not represent the true ends of the protein there can be disordered regions at the ends
75. on copy of executable scripts In DOT_ROOT data non compiled non executed resources DOT utilities need Under DOT_ROOT mpich ARCHOSV distribution and individual platform directories Under DOT_ROOT fftw3 ARCHOSV distribution and individual platform directories Note that in four cases src util src dot mpich and fftw3 you do not run configure make in those directories but in a subdirectory that is named after a particular platform SARCHOSV For example on i86Linux2 you would first make fftw3 and mpich then the utilities then DOT itself 55 October 1 2008 Need help Email dot help sdsc edu cd DOT_ROOT mpich i86Linux2 configure_for_dot2 make cd DOT_ROOT fftw3 i86Linux2 configure_for_dot2 make make install cd DOT_ROOT src util i86Linux2 configure make make install cd DOT_ROOT src dot i86Linux2 configure make make install 56 October 1 2008
76. ount 5 distmax 6 dotxyzfilter r distmax count Sstatxyz Smovxyz lt statmov top2000 e6d gt statmov filt generate PDB files of top 30 that pass filter e6d_first 30 statmov filter e6d pdbgen movmol filter The output of one filter can be piped into another since each writes to standard output and reads from standard input You can also concatenate the output from filters however the incoming e6d files must have the same fields a limitation of the e6d file format The dotxyzfilter utility has options to count the moving instead of the stationary atoms or to report only boolean results run dotxyxfilter help M Creating AVS etc visualization input files 45 October 1 2008 Chapter 7 DOT Utility Programs DOT 2 User Manual CVS Id du utils tex v 1 2 2008 05 09 04 57 58 mp Exp A Introduction The DOT distribution includes a set of programs intended to help set up and evaluate DOT runs About half of these are scripts written in a blend of csh bash and awk and about half are compiled programs written in C or C The scripts are installed in DOT_ROOT bin share and the programs in DOT_ROOT bin ARCHOSV where ARCHOSV is one of the supported platforms for DOT such as sun4SunOS5 As long as your PATH includes both of these you will not have to worry about which is which just type the name or write it into your own scripts and it will work Most of these programs were written by the CCMS team but we draw y
77. our attention to reduce which is from the laboratory of David and Jane Richardson at Duke University written by JMichael Word We encourage you to cite their work in any publications that result from your use of DOT along with the work of the APBS MSMS FFTW and MPICH teams see Acknowledgements section of this manual The following is an admittedly telegraphic rundown of these programs for most of them you can type their name followed by help The scripts often have more information in them which you can look at with any editor and the source for the compiled programs will be found in the DOT installation directory DOT_ROOT src util ace desolvation oo compiled program computes desolvation values acevalues oo compiled program dotxyzfilter compiled program evaluation distance filter bgrid_info compiled program inspects seldom used DOT binary grid bgrid_minmax compiled program inspects seldom used DOT binary grid bgrid_sort compiled program sorts seldom used DOT binary grid convert uhbd grid_to_bgrid compiled program creates seldom used DOT binary grid bgrid_bestenergy compiled program inspects seldom used DOT binary grid analyze triangles compiled program checks for degenerate triangles used by prepscript compare edges compiled program checks triangle list edges for data debugging compare faces compiled program checks triangle list faces for data debugging compare verts compiled progr
78. puters with more than 2 gigabytes of memory not just swap space to finish in a reasonable time an hour For a given grid size APBS takes several fold less time than UHBD but has much larger memory requirements For example running APBS using a 256 cubed grid needs 3 6 gigabytes whereas a 192 cubed grid needs only 1 6 gigabytes Finally when you use the options that prepscript supplies grid dimensions that are multiples of 32 work best for APBS J 3 User selection of grid dimension The user can override the grid size calculated by prepscript by using the d parameter in the gen_dot_prepscript command line For example gen_dot_prepscript m moving pdb s stat pdb d 128 will create a prepscript with a grid size of 128 For a couple of test cases we have found that the grid size nearest but smaller than the molecule diameter sum above gives almost identical results at a decreased computational cost J 4 Scripts utilities used DOT 2 User Manual CVS Id du prep stat tex v 1 2 2007 11 13 07 07 45 vickie Exp K Prepscript Stationary molecule The shape and electrostatic potentials describing the stationary molecule are more detailed and more time consuming to calculate than those of the moving molecule NOTE It is essential that both potentials are calculated in the same coordinate space Prepscript tries to ensure this by controlling the input file coordinates that are used For example for the starting stationary molecule sta
79. r that will cause prepscript to halt 25 October 1 2008 Need help Email dot help sdsc edu Example 1 Coordinates of my stationary stat pdb are missing the N terminal peptide with the chain beginning at residue 82 and missing the loop segment residues 100 120 So I want residues 82 and 121 each to have a neutral N terminus with the N atom having a single proton As in Case 3 I make a customized copy of dot2 prep mol common that includes a pause statement after the stat cen allh pdb cd projdir coords cp DOT_ROOT bin share dot2 prep mol common dot2 prep mol stat inserting the pause command pause correct protonation state of residues 82 and 121 Then I edit prepscript to call the customized dot2 prep mol stat dot2 prep mol stat r Nterm121 p stat pdb 1 Sreslib w stationary Reduce will then add 3 hydrogen atoms to the N terminus of any amino acid with a residue number up to 121 that is not preceded by a C O group When prepscript is called it will pause after the stat cen allh pdb file is created and the user edits both residues 82 and 121 replacing the atom name H1 with H and deleting H2 and H3 Example 2 Coordinates of my stationary stat pdb begin at residue 82 which is the true N terminus and are also missing the loop segment residues 100 120 So I want residue 82 to have a positively charged N terminus and residue 121 to have a neutral N terminus with a single h
80. residue 82 which is preceeded by a disordered N terminal peptide that is missing in the PDB coordinate file Therefore we want residue 82 to have a neutral charge N terminus First copy dot2 prep mol common from DOT_ROOT bin share to the coords directory of our project and give it a unique name cd projdir coords cp DOT_ROOT bin share dot2 prep mol common dot2 prep mol stat Second edit dot2 prep mol stat by adding a pause command to allow editting of stat cen allh pdb the PDB file created by REDUCE The pause command should be inserted after Reduce is run creating the allh file but before non polar hydrogens are removed For our case the pause command would be pause correct protonation state of residue 82 And this would be inserted in our customized dot2 prep mol stat to read as follows grep ERROR Sreduce_log gt dev null exit_if matches found reduce reported an error pgm quitting see reduce_log exit_if_file missing_or_empty cen_allh_pdb pause correct protonation state of residue 82 echo Remove non polar hydrogens Third edit prepscript to call our customized dot2 prep mol stat dot2 prep mol stat r Nterm82 p stat pdb 1 reslib w stationary Now run prepscript Prepscript will pause after running Reduce with the message correct protonation state of residue 82 Reduce will have added 3 hydrogen atoms named H1 H2 and H3 to the N atom of residue 82 To make the neutral N
81. riately compiled DOT program Each computer must also have access to your project directory Furthermore you must be able to either rsh or ssh to each computer Getting ssh or rsh may require help from a systems administrator See Chapter 5 Section D 2 ALERT If this is the first time you have used DOT with multiple computers or processors see Section C 5 below before running a time consuming production run To run on multiple computers create a text file in your project directory that is a list of the computers one name per line For example for 3 workstations I make a file called myhosts dopey sleepy grumpy If I want to use multiple processors on a single computer put one line for each processor For example if I have an 8 processor computer dopey and I want to use 3 processors my myhosts file would contain dopey dopey dopey For our big production run we want to sample the rotational search for the moving molecule with a fineness equivalent to that of the translational search The number of orientations needed is dependent on the size of the moving molecule We have found that the 6 degree search is appropriate for globular proteins with up to 120 residues diameter of up to 40 A Prepscript creates two parm files that do a 6 degree rotational search in one no atoms of the moving molecule may penetrate the interior of the stationary molecule in the other up to 10 moving molecule atoms are allowed to penetr
82. ript proceeds see Section G 9 October 1 2008 Need help Email dot help sdsc edu B 6 Most likely problems The prepscript script should run with no intervention needed IF 1 The two molecules are proteins with standard residues The programs in prepscript can handle the various protonation states of His and Cys free and disulfide 2 Both molecules begin with residue ID 1 any chain ID is okay and you want each N terminus to be a positively charged amino group 3 The checks done in prepscript are appropriate for your system in particular the total charge is not huge Why These problems ALL revolve around the need to assign atomic charges and molecular radii for the stationary molecule to do the electrostatic potential calculations and to assign atomic charges for the moving molecule The program Reduce which adds hydrogen atoms depending on the local environment handles standard amino acid residues The default DOT residue library contains atomic charges and molecular radii for standard amino acids including the typical protonation states of His Cys free or in a disulfide and charged N and C termini Prepscript assigns a unique residue name for each uniquely charged and protonated state of each standard amino acid If a residue does not appear in the DOT residue library prepscript will stop Prepscript will also stop if the total charge on a molecule is not an integer This is most likely to happen when a known residue type is
83. s in a specified set of PDB files which must have the same number of atoms in the same order Evaluation tool pdb_rot script rotates a PDB file by specified rotation matrix about the origin pdb_rottrans script does pdb_rot followed by pdb_trans q v pdb_to_boxcenter script computes bounding box of PDB file pdb_to_acenames script converts PDB atom names to ACE internal codes pdb_to_dot script pdb_to_uhbd script renames PDB pre version 3 hydrogens to form acceptable to UHBD essentially AMBER names also clears chain id column pdb_to_vol script pdb_to_xyz script prints the x y z coordinates of each atom in a PDB file used by prepscript for the moving molecule pdb_to_xyzatomres script prints the x y z coordinates atom types and residue types of each atom in a PDB file pdb_to_xyzcrv script computes the DOT stationary molecule shape description including forbidden interior and attractive vdw layer used by prepscript pdb_to_xyzq script prints the x y z coordinate charge and optionally radius of each atom in a PDB file The values come from the AMBER style rlb file in DOT_ROOT data uhbd amber84 prot rlb unless a different library is specified pdb_to_xyzqr script prints the x y z coordinate charge and radius of each atom in a PDB file The charges come from the AMBER style rlb file in DOT_ROOT data uhbd amber84 prot rlb unless a different library is specified The radii come from th
84. s should print out after the which command We do not supply a copy of MSMS so you will need to download it if you do not already have it See Chapter 8 of this manual for instructions Once you have installed MSMS and APBS and Reduce if you do not use the ones in the DOT distribution you must add its location to your shell PATH For example if MSMS is installed in usr local bin type For csh tcsh set path usr local bin path For sh bash export PATH usr local bin PATH D 2 Run prepscript Go to the supplied prepscript test directory cd testdot test prepscript Then go to the subdirectory coords cd coords ls In the coords directory there are two PDB files udg pdb and ugi pdb and the ready to run script prepscript Now run prepscript keeping a log file For csh prepscript amp tee prepscript log For bash prepscript 2 gt amp 1 tee prepscript log This will take several minutes and generate a few pages of log file output including a few warning messages you can disregard The last few lines of the output should resemble Stationary potential clamp high will be 1 8275 Stationary potential clamp low will be 1 7861 Stationary molecule files are complete test prepscript coords Generating sample DOT parm file udgugi zero rotation nb0 parm Generating sample DOT parm file udgugi deg72 nb0 parm Generating sample DOT parm file udgugi deg06 nb0 parm Generating sample DOT parm file
85. scription XYZCIV ee 32 K 3 Electrostatic Clamping Values e 34 DOT Parameter File pam 34 Prepscript checks throughout processing a 34 M 1 Not able to find REDUCE APBS MSMS 0 00000 0002 2 een 34 M 2 Error reported by REDUCE 2 000 020 0000 0000022 eG 34 M 3 Problems removing nonpolar H atoms striph o o e 34 M 4 Molecular description files not made correctly most common problem is file is empty 34 M 5 Non integral molecular Charge e 34 M 6 Large positive or negative 20 molecular charge o o 34 M 7 Grid dimensions could not be calculated or too big o ooo 34 M 8 APBS not running properly 2 0 a 35 M 9 Problem with determining clamping values uhbdgrd_lookup_xyz 35 36 Samiple mput Mle sAr a ged ce ae he Brag a hoe Gb aoc oe a 36 Most likely to change parameters no of orientations number of allowed bumps 36 Molecular description files needed by DOT o o a 37 Running DOT ences eae a de Ged Wie eS ee A ee A A a 37 D 1 Single processor moded e ar ad E A EO 37 D 2 Multiple processors such as a workstation farm osoo 37 D 3 Multiple processors single supercomputer ooo a 38 DOE actions On input messe a es EA A en A SA ed ae A A iae 38 E 1 Stationary molecule shape potential xyzcrv ooa ee 38 E 2 Mo
86. so for a starting molecule named mol pdb files created are e mol cen noh pdb the centered PDB file with no hydrogen atoms e mol centered xyz the geometric center of the original PDB file Check this Why It is especially important that the moving molecule is centered This ensures an even sampling by the set of orientations applied to the moving molecule since the orientations are rotated about 0 0 0 If the moving molecule is significantly off center large regions of rotational space may be missed For the stationary molecule centering positions the molecule in the center of the grid providing the largest distance between the molecule and the edges of the grid for the moving molecule to fit into It may be convenient to move the stationary molecule from a centered position For example in dockings to a fragment of cytochrome c oxidase the face of the fragment that is connected to the membrane bound portion was translated to the bottom of the grid leaving available more space in the grid for the molecule being docked 3a Scripts utilities used SDOT_ROOT bin share pdb_make_centered mol pdb gt mol cen pdb This finds the geometric center of the molecule and writes a new PDB file with the same atoms but moved so the molecule s center is at the point 0 0 0 DOT 2 User Manual CVS SId du prep allh tex v 1 14 2008 04 02 03 52 04 vickie Exp 14 Protonation with REDUCE allh files The script dot2 prep mol comm
87. stname to see what it is in it multiple times for example if your computer is named dopey make a hosts file with two lines like this but use dopey local on a Macintosh dopey dopey 37 October 1 2008 Need help Email dot help sdsc edu Try four copies on a quad processor and so on The MPICH manual tells other ways to do this but this is simple and works quite well for DOT s loose parallelism You can mix different computer platforms in your hosts file as long as DOT has been compiled and installed for that computer type the DOT 2 0 release includes Intel Linux Power PC and Intel Mac OS X Darwin and Sun SPARC Solaris For example if your local network has an Intel Linux named dopey a Power PC Macintosh named happy and a Sun Solaris named bashful just make a 3 line text file named dot hosts in your DOT project directory containing dopey happy bashful We have had best results putting first the name of the computer you re starting DOT on but please let us know how this works for you Once you get going and you have many machines listed you can comment out ones you don t want for a particular run by putting a in front of those lines Be sure the DOT_ROOT directory as well as your DOT project directory are mounted on all the hosts We use NES Sun s network file system for this but any local area shared file system should work let us know of problems or successes please Finall
88. system If you want to check that executables are available for your computer type SDOT_ROOT bin share archosv The resulting string should match one of the directory names in DOT_ROOT bin listed above Note that the Mac OS X executables have been compiled on Mac OS X 10 4 Tiger Darwin 8 However they work also on 10 5 Leopard Darwin 9 therefore we have set archosv to report Darwin8 on Darwin 9 Leopard Also included in the distribution are the DOT_ROOT subdirectories src and data D External programs needed to create DOT input files D 1 General utilities shell awk gawk DOT utilities are either compiled programs which are installed in DOT_ROOT bin ARCHOSV or are written as shell scripts which are installed in DOT_ROOT bin share Many of these shell scripts invoke standard Unix Linux utilities including awk bash csh head tail sort m4 sed grep rm and date None of these should be a problem but if you find incompatibilies let us know The m4 program used by gen_dot_prepscript comes with Mac OS X but only if you install the Developer Tools so we supply m4 in the distribution in the appropriate DOT_ROOT bin Darwin subdirectory Some of the scripts need a version of awk that has certain capabilities and is called awk on some platforms and nawk on others the scripts first try to use nawk and if nawk is not found they use awk In all cases
89. t pdb the coordinate file stat cen polh pdb 30 October 1 2008 Need help Email dot help sdsc edu is used to create the electrostatic potential grid and the coordinate file stat cen noh pdb is used to create the shape potential Besides necessary DOT utilities creating the potential files requires the programs MSMS for creating molecular surfaces and APBS or UHBD for calculating the electrostatic potential DOT 2 User Manual CVS Id du prep grid tex v 1 9 2008 05 09 04 57 58 mp Exp K 1 Calculate Electrostatic Potential for Stationary Molecule dx The electrostatic potential grid is currently calculated by APBS Prepscript creates the parameter or command file and runs APBS Prepscript can use UHBD instead run prepscript uhbd The following parameters for calculating the electrostatic potential are set up in prepscript and can be edited by the user pot_ionstr 150 Ionic strength millimolar pot_maxits 500 For UHBD only The APBS calculation can take many minutes to run The following line in prepscript runs the script dot2 prep potgrid apbs which lists the default APBS parameters sets up the APBS parameter file in file sets up the coordinate file read by APBS which contains the atomic coordinates partial charges and atomic radii for each atom heavy and polar hydrogen atoms in the stationary molecule and runs APBS creating a log file and the electrostatic potential grid dx file Example
90. t polar hydrogen atoms appear in the polh file Assignment of partial atomic charges can be straightforward from examination of the current charge library For example if I had the partial atomic charges assigned for CYM a Cys ligated to Cu but I need CYM with a charged C terminus I would create CYMC which added typical charges for the CO2 group One the other hand the atomic charges for CYM itself are more difficult and beyond the scope of the DOT software package For a cofactor a quantum mechanics program such as Gaussian could be used Some cofactors such as HEM have published sets of point charges because parametrization of these functional groups has been done for molecular dynamics simulations Complex functional groups such as metal clusters require complex calculations The DOT team has charge libraries for DNA bases and HEM groups we are happy to share with you just email us DOT 2 User Manual CVS Id du prep gridsize tex v 1 7 2008 04 02 03 52 04 vickie Exp J Determining grid size Optimally the moving molecule in any orientation should fit in the grid surrounding the stationary molecule when the two molecules are close This ensures that the moving molecule when close to the stationary molecule will not be influenced by the shape or electrostatic properties of the stationary molecules in adjacent cells The default grid spacing is currently 1 A We have found that a grid spacing of 2 A is too coarse to give good resul
91. ter molecules but the user must decide if other molecules are an intrinsic part of the molecular structure and remove those that are not For example a metal ion may be an important cofactor keep or present between molecules in the crystal as a heavy atom derivative remove Some crystallographic structures and most NMR structures include some or all hydrogen atoms These atoms may have nonstandard names or geometries depending on the method of refinement This step of prepscript will remove them and they will be replaced in later steps If the user wants to retain hydrogen atoms from the PDB skip this step Some PDB have multiple locations for some atoms say a mobile side chain Prepscript selects only the first listed alternate location for an atom If multiple molecules are present in the asymmetric unit of a crystallographic file the user must decide which molecule is to be used for docking and edit the input file accordingly to contain just one molecule DOT 2 User Manual CVS SId du prep center tex v 1 5 2008 04 02 03 52 04 vickie Exp 22 October 1 2008 Need help Email dot help sdsc edu 13 Centering molecules The script dot2 prep mol common called by prepscript finds the geometric center of each molecule and translates the coordinates so that the center lies at 0 0 0 Files of centered coordinates have cen added to their name In prepscript the centering is applied to the molecule with no hydrogen atoms
92. terminus atoms H1 H2 and H3 of residue 82 need to be replaced by a single H Edit stat cen allh pdb to remove H2 and H3 and change the name of H1 to H the standard backbone amide proton name Make sure the H and the rest of the fields on the line are in the correct columns Then resume prepscript by typing enter or return Case 4 There is a disordered loop in the middle of the protein chain such that a peptide segment is missing This creates a residue with no preceding covalently bonded C O group in the middle of the peptide chain that should be built as a neutral residue to avoid introducing inappropriate positive charge in the protein The goal is to have a single hydrogen atom on the N atom for this residue As for Case 3 this case requires e Making a customized version of dot2 prep mol common for the molecule in which a pause command is inserted after the allh PDB file is made by Reduce e Editting prepscript to add the appropriate flag for Reduce and call the customized version of dot2 prep mol common e Editting the PDB file allh pdb created by Reduce when the pause command pauses prepscript If prepscript is run without these changes Reduce will not add any hydrogen atoms to the N atom lacking the preceding covalently bonded C 0 group and the total charge for the residue will not an integer a warning in prepscript AND the total charge for the peptide chain will not an integer an erro
93. tes subdirectory in runs starting with a numerical time stamp and ending with udgugi deg72 nb0 taken from the name of the parm file For example by doing the DOT run on October 1 2007 at 7 32 PM we got the directory name 20071001 1932 udgugi deg72 nb0 If you go there you will find the directory top30pdb which contains PDB files of the ugi inhibitor cd runs 20071001 1932 udgugi deg72 nb0 top30pdb These PDB files of ugi are relative to the udg coordinates in the file udg cen noh pdb one directory up in 20071001 1932 udgugi deg72 nb0 in our case Examine the file udg cen noh pdb and the ugi files in top30pdb with your favorite molecular visualization program to see the 30 top ranked complexes found by DOT D Prepare DOT input files using supplied PDB files Next you will prepare the DOT input files starting with two PDB files we supply in the distrbution D 1 Make sure necessary programs are installed You need three programs MSMS APBS and Reduce to prepare your molecules for DOT although DOT itself runs without them First we will check to see 1f these three programs are already available Type which msms which apbs which reduce We supply copies of APBS and Reduce in the distribution in DOT_ROOT bin ARCHOSYV so if your computer 4 October 1 2008 Need help Email dot help sdsc edu platform is one of the ones supported in the current DOT distribution the path to these two program
94. to cofactors and compare that charge with those calculated by APBS and in the xyzq file Prepscript controls the size of the grid used by APBS so that it is compatible with the DOT calculation The grid dimensions in APBS are the DOT dimensions plus one For example for our DOT calculation done on a grid 128 on a side with 1 A grid spacing for stat pdb the APBS input file made by prepscript stat cen 128 150m apbs in has parameters grid 1 0 Grid spacing dime 129 129 129 Grid dimensions in the x y and z directions Other parameters for APBS are set to be similar to the UHBD parameters that were used to balance the electrostatic and shape terms in DOT The user can decide to alter these parameters but we have found that different parameters influence the magnitude of the electrostatic potential over parts of the grid particularly near the molecular surface This could effect the balance of the electrostatic and van der Waals energy terms calculated by DOT DOT 2 User Manual CVS Id du prep xyzcrv tex v 1 6 2007 11 21 22 19 41 mp Exp K 2 Stationary Shape Description xyzcrv The shape of the stationary molecule is described by an excluded volume surrounded by a 3 favorable layer the xyzcrv file The program MSMS is used to create the surfaces that are used to define these volumes MSMS rolls a probe sphere default radius of 1 4 A over the van der Waals radii to generate a continuous smooth surface representation of the molecul
95. ts with macromolecules and 29 October 1 2008 Need help Email dot help sdsc edu also gives a poor approximation of the electrostatic potential of the stationary molecule A grid spacing smaller than 1 is not computationally efficient for macromolecules If the user is looking at small molecule docking over a smaller region of space a finer grid spacing may be needed The rotation sets provided will give a rotational search for a small molecule comparable to the translational search with small grid spacing The default is a cubic grid J 1 Default method based on molecular diameters To ensure that the moving molecule sits inside the grid surrounding the stationary molecule prepscripts calculates S 2M where S is the largest length for the stationary molecule in X Y or Z and M is the largest diameter for the moving molecule Prepscript then selects the grid size that is closest to but larger than this sum J 2 Sizes efficient for the calculation Grid sizes of 64 128 160 192 224 and 256 are most efficient for the FFT calculation The calculation time is proportional to NlogN where N is the number of grid points A grid size of 128 takes about 9 times longer than a grid of 64 and a grid size of 160 takes about 2 times longer than a grid of 128 Larger grids also take longer for the electrostatic potential calculation using APBS or UHBD and grids larger than about 200 cubed need to run the electrostatic calculation on com
96. uce You need the three programs MSMS APBS and Reduce to prepare your molecules for DOT although DOT itself runs without them If you have successfully run the tutorial Chapter 2 these programs are already in your shell PATH To verify type which msms which apbs which reduce Copies of APBS and Reduce are supplied in the DOT distribution in DOT_ROOT bin ARCHOSV so if your computer platform is one of the ones supported in the current DOT distribution the path to these two programs should print out after the which command If MSMS is not installed see Chapter 8 of this manual for instructions After installing Need help Email dot help sdsc edu MSMS its location must be added to your shell PATH For example if MSMS is installed in usr local bin type For csh tcsh set path usr local bin path For sh bash export PATH usr local bin PATH B Set up your molecular system Goal In this section you create a directory structure for your project and generate and run the scripts necessary to create the input files for DOT B 1 Create Working Directories To create the directory structure for your project first create a main directory and under this a subdirectory coords For example if the project directory is named projdir type mkdir projdir cd projdir mkdir coords B 2 Select the stationary and moving molecules Copy the coordinates in PDB format of the two molecules to be docked to the
97. undot statmov deg72 nb0 parm h hosts test of multiple processors This will test that all of the machines can access an appropriately compiled DOT program and its utilities that all machines can access your project directory and that you get either rsh or ssh to each computer Bug Warning currently the number of computers must be no more than the number of orientations of the moving molecule Since you probably have only a dozen or so computers available and usually have hundreds or thousands of orientations this is not likely a problem except for the single rotation test case 12 October 1 2008 Chapter 4 Prepscript creating input files for DOT DOT 2 User Manual CVS SId du prepscript tex v 1 11 2008 04 02 03 52 04 vickie Exp Goal This chapter gives detailed descriptions of the steps needed to set up your molecular system for DOT docking runs The scripts described in this section are e gen_dot_prepscript generates a prepscript customized for your system e prepscript creates molecular input files and parameter files for DOT and the utilities and scripts called by these DOT 2 User Manual CVS Id du prep outline tex v 1 10 2008 04 02 03 52 04 vickie Exp A Outline of setup steps 1 Accessing DOT scripts and utilities and auxilliary programs 2 Examine the molecule structures for completeness 3 Select stationary and moving molecules 4 Set up directory structure 5 Set up customized library
98. ve an NH3 group Within prepscript I edit the line that calls dot2 prep mol common to process the stationary molecule dot2 prep mol common r Nterm82 p stat pdb 1 S reslib w stationary With this command all residues with residue numbers less than or equal to 82 will have 3 hydrogen atoms added to their N atom but only if there is no preceding covalently bonded C O in the coordinate file Warning If there are multiple chains in the stationary molecule Reduce will add 3 hydrogen atoms to all the N terminal residues with residue numbers 82 or less Reduce does not consider chain IDs Case 3 The N terminal residue is not the true start of the chain so it should be neutral to avoid introducing an inappropriate positive charge when atomic charges are assigned The goal is to put a single hydrogen atom on the N atom of this residue This case requires e making a customized version of dot2 prep mol common for the molecule e editting prepscript e editting the PDB file created by Reduce Currently REDUCE adds no protons to a backbone N atom when there is no preceding C O AND the residue number is not 1 AND the Nterm option is not specified To achieve our goal we must make Reduce add 3 hydrogen atoms to the backbone N atom then edit the PDB file made by REDUCE with all hydrogen atoms 24 October 1 2008 Need help Email dot help sdsc edu Example Our starting stationary molecule coordinates stat pdb begin with
99. ving molecule potentials are quite different from each other 3 Other factors specific to the molecular system 15 October 1 2008 Need help Email dot help sdsc edu D 1 Computation time Computational time for a given resolution of the search is shortest when the larger molecule is stationary S and the smaller molecule is assigned as moving M The computational time is dependent on two factors the number of points in the grid and the number of orientations applied to the moving molecule The grid representing S is repeated in all directions periodic boundary conditions Given the default grid spacing used by DOT of 1 grids that are 32 64 96 128 160 192 and 224 A on each side are the most efficient for the DOT algorithm The goal is to select the minimum grid size where M does not see adjacent copies of S or their properties A good rule of thumb is that when M contacts S M lies entirely inside the grid A generous measure of this is that the grid dimension should be larger than S 2M where S and M are the largest diameters for the two molecules S 2M will be smaller when the larger molecule is assigned as S The properties of the molecules may be a factor in choosing the size of the grid For example if S creates a strong electrostatic potential there may be significant values at the edges of the grid This is most likely when S is highly charged or has a large region with a high concentration of residues with similar charg
100. ving molecule shape Xyz ee 39 ili October 1 2008 Need help Email dot help sdsc edu E 3 Stationary molecule electrostatic potential apbsgrd o o o ooo o 39 E 4 Moving molecule partial atomic charges xyzq o o o e e 39 F What DOL computes gt pios a a a a A A oa 39 Fl vander Wadls ECT yi end ib cele fale a be ee Pain donde atau ad 39 F2 Electrost tic energy aaa oe EY SO A a ah a se A ee eS 39 GE DOF parameters oh tos on ete A a ac Bk AZ 39 G 1 Sample Parameter files o 40 G 2 Parameter Reference Table rr a as da a 40 G 3 Parameter Guide cio caia A dy ee Pek OE a be Aa Pes 41 6 Evaluation 42 A Atter DOTTO ati teams See ved beat Deals sd A Bk oa fl 42 B cevaliiate dotsne pcre eh Bo ek a AA ee a ow a eae ae ke a ee 42 C Future files to evaluate DOT run ee ee 43 Dy log mleS n moge A A ato A tt A Vasey 43 E 6d output files mi A ee Ree a ce ee a 43 F Quick comparison of center and orientation e 43 G Creating PDB files a e 2 ge ete he Be Ba BR eB ee ae A eae a 43 H RMSD values between PDB files 2 0 2 0 0 2 000220 00 000000000004 43 I Bump checking PDB files 2 0 00 2 ia a aa a a ee 43 J Residue residue interactions 43 K Re rankinge by AGE scores icc ode nak A A Bre nO eae al A a 43 L Distance filtering 2034 22 hk e BO A ae BOO ee eee ee a 44 M Creating AVS etc visua
101. xyzqr but in format for APBS input uhbdgrd electrostatic potential grid created in UHBD format dx electrostatic potential grid created in APBS format xyzcrv shape file If the preparation script succeeds then the following files will be generated projdir coords mov mov pdb projdir coords mov mov center xyz projdir coords mov mov cen polh xyzq projdir coords mov mov cen polh pdb_to_xyzq projdir coords mov mov cen polh pdb projdir coords mov mov cen polh xyz projdir coords stat stat pdb projdir coords stat stat center x yz projdir coords stat stat cen uhbd pdb projdir coords stat stat cen polh pdb 20 October 1 2008 Need help Email dot help sdsc edu projdir coords stat stat cen pdb projdir coords stat stat cen noh pdb projdir coords stat stat cen polh xyzqr xml projdir coords stat stat cen polh apbs log projdir coords stat stat cen 128 150 0m dx or similar projdir coords stat stat cen noh xyzcrv projdir coords stat stat cen noh r 1 4 p 1 4 xyzcrv projdir coords stat stat cen noh clamp minmax H 4 The pause command pausing during creation of DOT input files If you insert pause reason into prepscript or the scripts called by prepscript such as dot2 prep mol common the script will pause at this statement which the reason reminding you of what you need to do NOTE Use the single quote to delineate the comment Type enter or return will make your script resu
102. y you must be able to run programs on all of these hosts from the start up host without having to type your password Be sure of this by typing for example ssh happy date ssh bashful date If you cannot do this using ssh try again using rsh rsh happy date rsh bashful date If rsh works but ssh does not set the environment variable P4 RSHCOMMAND to rsh sh bash export P4_RSHCOMMANDsrtsh csh tcsh setenv P4_ RSHCOMMAND rsh before trying to run DOT in parallel You can learn about how to set up your ssh and rsh environments so they do not require pass words see the MPICH web documentation especially http www unix mcs anl gov mpi mpich1 docs mpichman chp4 node127 htm Node127 and http www unix mcs anl gov mpi mpich1 docs mpichman chp4 node128 htm Node 128 See also the discussion in the Installation chapter of this manual If you are unable to get either ssh or rsh to work without passwords see the MPICH chp4 manual to set up a secure server The MPICH manual also describes several fancier and more flexible ways to run DOT in parallel in particular you can set up once a machines file then run DOT using mpirun np see http www unix mcs anl gov mpi mpich1 docs mpichman chp4 mpichman chp4 htm You can also run DOT on a Beowulf style computer cluster or on a massively parallel supercomputer such as an IBM BlueGene please email the DOT help line dot help sdsc edu for help D 3 Multiple processors
103. y will want to customize evaluate_dot_run To do this copy it from DOT_ROOT bin share into your project directory cp DOT_ROOT bin share evaluate_dot_run make it executable chmod x evaluate_dot_run and modify it as you wish When rundot finds an evaluate_dot_run there rundot will run it in preference to the default version in DOT_ROOT bin share or in fact in preference to any other found in your path C Future files to evaluate DOT run D logfiles E e6d output files name topNNN e6d F Quick comparison of center and orientation Distance of the center and orientation from a reference value useful for comparisons with the correct solution or a preferred solution Needs only the DOT output file of solutions translations and orientations not the full coordinates G Creating PDB files pdbgen mov cen noh pdb shortname file e6d H RMSD values between PDB files RMSD differences among a set of possible solutions Useful for identifying similar solutions among the top few solutions Requires making the PDB files with at least Calpha positions I Bump checking PDB files Requires making the PDB files J Residue residue interactions Useful for comparison with a known solution or a preferred solution The method used to evaluate results in the CAPRI competition Requires making the PDB files K Re ranking by ACE scores Example re rank the top 20 000 DOT results by sum of ACE potential using default options and DO
104. ydrogen atom I proceed as in Example 1 but when prepscript pauses I only need to edit residue 121 replacing the atom name H1 with H and deleting H2 and H3 4b Protonation states of His and Cys Reduce handles protonation of standard states of His and Cys without intervention Depending on pH and environment the imidazole ring of His can be protonated on either ND1 or NE2 neutral or on both ND1 and NE2 positively charged In addition the two possible orientations of the imidazole ring should be examined 180 degree ring flip since at the typical resolution of crystallographic protein structures the two orientations cannot be distinguished from the electron density alone REDUCE considers both orientations of the imidazole ring when it determines the most likely protonation state For DOT we suggest making His residues positively charged only when there is a compelling reason for doing so such as a His residue in which both ND1 and NE2 form hydrogen bonds with carboxylate groups such as those of Asp and Glu Reduce takes this conservative approach usually adding only one H atom to the two N atoms of the imidazole ring Reduce also recognizes when His is ligated to a metal ion and adds hydrogen atoms appropriately The user can use other methods to determine the His protonation state including algorithms that attempt to determine the pKa For an isolated neutral His residue at pH 7 protonation on NE2 is sli

Download Pdf Manuals

image

Related Search

Related Contents

SOPORTE PARA TV LCD/LED EXTRA-DELGADO (17"-37")  Autorefractor/Keratometer HRK-8000A Huvitz  if you are new to gex, or, this is your first  SSD Firmware Update Utility Guide  

Copyright © All rights reserved.
Failed to retrieve file