------------------------------------------- June 24, 2005 FMOutil version 2.0 Copyright (C) 2004-05 AIST coded by D.G.Fedorov, T.Ishida, K.Kitaura AIST, Tsukuba, Japan http://staff.aist.go.jp/d.g.fedorov/ ------------------------------------------- 1. About FMOutil FMOutil is a program to generate input data for FMO calculations with GAMESS. The functions of the program are, 1) generate input data for FMO calculations with GAMESS, 2) add H atoms to PDB data, 3) report geometrical parameters (bond lengths, bond angles, and hydrogen bonds, 4) report residues which are within a threshold distance from specified residue(s)/molecule(s). For the calculations of proteins/polypeptides, you will start with PDB data. The PDB data usually do not include hydrogen atoms. This program has a function to add hydrogen atoms. The quality of generated structure, however, may not be high enough because their positon is not optimized. We recommend optimisation of H atom positions using some modeling software and it may be better to add H atoms with the modeling software as well. However, not all modeling software is capable of adding H atoms physically reasonable. HyperChem 7, for example, attaches the HG hydrogen atom of Leu badly. One can use TINKER program package distributed together with GAMESS to prepare better geometries of bio-molecules by optimizing the positions of hydrogen atoms with force filed methods after the addition of the atoms. Whatever way you follow to prepare the initial structure, use some graphics software to look at it before commencing FMO calculations! Five minutes of careful looking can save weeks of wasted CPU time. The current version of FMOutil can handle proteins/polypeptides and their complexes with non-peptide molecules including water molecules. For the FMO calculations of DNA/RNA, you can use PEACH [1], a MD program developed by Komeiji, which has a function to generate FMO input data for GAMESS. This program also can generate input data for FMO calculations with ABINIT-MP developed by Nakano [2]. 2. Installation This program is simple enough to compile and link at a stroke. So, no installation tool is attached. The file, fmoutil.f (about 12,000 lines), includes all fortran source codes. With the f77 compiler on linux on personal computers, the compile and link will be completed by the following one command, f77 -o fmoutil fmoutil.f The program was not tested on other machines/OS's. If you encounter errors please report them to us along with your hardware/OS information. The parameter statement, Parameter (MaxAtm=20000,MaxRes=1000,MaxMol=10, MaxFrg=1000), determines the maximum number of atoms, residues, molecules, and fragments, respectively. The users will modify this statement for larger molecules. 3. How to use the program The program is designed to use interactively and minimum inputs are required. The users will easily understand how to use this program. First, the program prompts, Enter Job # : 1: Generate FMO input data for GAMESS 2: Add H atoms to PDB data 3: Report Geometrical Parameters 4: Report Distances of Residues 0 or Enter Key. Quit All Jobs require input and output files. Enter file names according to the following prompts, Enter Input PDB File(s) : Enter Output file : The former prompt accepts up to five file names separated by spaces. Job #1 reads PDB format data which should contain a complete coordinates of atoms including hydrogen atoms and writes out generated FMO input data on the output file. Job #2 reads PDB data which do not have hydrogen atoms and writes out the PDB format data with supplemented hydrogen atoms on the output file. JOB #3 and #4 read PDB format data and write out the geometrical information which is useful to see details of molecular geometries on the output file. 4. FMO input data generation (JOB #1) The prompts of JOB #1 are grouped into three categories. The first group is related to your computers, the second group to FMO calculations and the last group to partition of molecules. 1) Computer system 1> How many computer nodes do you want to use ? 1> How many CPUs does each node have ? 1> How much memory does each node have (in megabytes) ? FMO calculations use the generalized DDI (GDDI) [3] for efficient parallel processing. GDDI requires input data for grouping of the available CPUs (the $gddi name group). The information input here is also used to make the nintic data in $intgrl, the ngrfmo(1) data in $fmoprp, etc., along with the $gddi group data. 2) FMO calculations 1> Choose runtyp (1:energy, 2:gradient, 3:geometry optimization) For geometry optimizations, the esp and the separated dimer approximations will be set more strict than these for single point energy calculations; 1.0->2.0 for respap, 2.0->2.5 for resppc and 2.0->2.5 for resdim. By default the GAMESS standard optimizer is selected for molecules with less than 2000 atoms. The optimization method for larger molecules should be selected at the following prompt. 1> Choose geometry optimization method (1:conjugate gradient, 2:simple Hessian update, 3:GMS standard optimizer) 1> How many layers do you want to define (1-5) ? 1> Enter fragment numbers to be assigned to layer (2-nlayer). 1> Choose wavefunction type (1:RHF, 2:DFT, 3:MP2, 4:CC, 5:MCSCFF) for each layer. separated by blanks. For DFT, B3LYP is specified by default. If you want to use other exchange-correlation functional, you have to modify the output file by hand. Similarly for the CC method, CCSD(T) is spelled out by default. You can change it to the desired one by manual post-editing. Do not choose non size extensive renormalized CC methods except for multilayer runs with 1 fragment in the CC layer. When MCSCF is selected, the following prompts will appear, 1> Enter the MCSCF fragment number. 1> Enter the spin multiplicity of the MCSCF fragment. 1> Enter the number of core orbitals, the number of active orbitals and number of active electrons. For MCSCF, the initial orbitals should be provided before the execution. See the GAMESS manual, INPUT.DOC. 1> Enter basis set (1:STO-3G, 2:3-21G, 3:6-31G, 4:6-31G*, 5:6-311G*) for each layer. separated by blanks. The STO-3G basis set for the 2nd row elements in GAMESS is different from the original one. The 6-31G basis set for Si is also different. (see the GAMESS manual, REFS.DOC) FMOutil will ask to choose either one. 1> Do you add diffuse functions on COO- groups ? (1:yes, 2:no) We recommend to add diffuse functions on the COO- group atoms of GLU, ASP and C-terminus. 1> Enter the desired n-body expansion (2 or 3) ? The 3-body FMO method (FMO3) is available only with the RHF, DFT and CC wavefunctions. 1> Would you like to print Mulliken charges (1:yes, 2:no) Mulliken populations will not be printed when the 3-body FMO is selected. 1> Would you like to produce a cube file with the total electron density ? (1:no, 2:standard, 3:sparse) 1> Enter grid spacing in Angstrom. The cube file option become void in geometry optimization jobs. 3) Partition of molecules 1> Enter fragment size (1:1res-per-frg, 2:2res-per-frg) 1> are S-S bonded CYSs combined to one ? (1:yes, 2:no) 1> is GLY combined to the neighbor ? (1:yes, 2:no) We recommend to reply 2,1, and 1 for a production run. It is assumed that the "TER" keyword separates molecules; "molecule" means no bond between them. FMOutil can split proteins/polypeptides but can not split non-peptide molecules which are left as one fragment. If need arise, the users should fractionate non-peptide molecules by hand. 5. Addition of hydrogen atoms (JOB #2) 1) Polypeptides/proteins JOB #2 adds hydrogen atoms to polypeptides/proteins in PDB input file. In this job, the following prompts will appear. 2> Protonate N-terminus (1:yes, 2:no) 2> Protonate C-terminus (1:no, 2:yes) 2> Protonate ASP and GLU (1:no, 2:yes) 2> Protonate HIS (1:yes, 2:no) If 2(no) is entered for the above prompt, one extra prompt will appear, 2> Neutral HIS ... H at D1 or E2 (1:D1, 2:E2) 2) Water molecules Hydrogen atoms are added to water molecules automatically. 3) Non-peptide molecules For non-peptide molecules, it is not possible to determine positions where hydrogen atoms are attached based on only the chemical formula given in PDB files. You have to know the structural formula of the molecule which may be found in the original paper cited in the PDB file. Once you see where to add hydrogen atoms, you can use fmoutil for the purpose by adding extra data to the input PDB file. The data format is as follows, REMARK ADDH add-h-type atom1 atom2 atom3 [cis(0) or trans(1)] [H-X bond length] The parameters in [ ] are optional and the "cis or trans" option is defined only for "add-h-type" 13. See Appendix for the definition of "add-h-type" and an example. If your system has non-peptide molecules as well as peptide molecules, it may be easy to process to separate the PDB file into a peptide file and non-peptide molecule files. These files (maximum five files) will be used as the input PDB file(s). 6. Report on molecular geometry (JOB #3 and #4) 1) Bond lengths, bond angles and hydrogen bonds JOB #3 writes out geometrical parameters on the output file. For this job, the following prompts will appear. 3> Bond Lengths, Bond Angles, H-bonds (1:yes, 2:no) ex. 111 for all, 221 for H-bond only If you need all of these geometrical parameters, enter "111" (without spaces between the numbers). "222" does nothing. 2) Residue distance JOB #4 reports residue distances form specified residue(s)/molecule(s). This report may be useful to find out which residues are close to specific residue(s)/molecule(s), for instance, the distances of residues from a ligand in a protein-ligand complex. For the JOB, the following prompts will appear. 4> Enter Res/Mol numbers from which Distance are measured ex. 2 3 - 5 8 10 This example means distance between residues from the 2 3 4 5 8 and 10 residues are measured. The distance is given as that of closest contact atoms of residues. Another prompt looks like, 4> Enter Threshold Distance ex. 2.5 (vdw unit) or 5.0 A (Angstrom unit) The residues within this threshold distance are printed. The value can be given either in vdw unit (relative to a sum of the vdw radii of the closest contact atoms) or in Angstrom unit. 7. Sample Output Sample output files derived from PDB code "1crn" are attached. (The "1crn.pdb" file is not attached.) To reproduce these outputs, 1) download "1crn" from a PDB site and save as "1crn.pdb", 2) execute "fmoutil", 3) select JOB #2, 4) use "1crn.pdb" as the PDB input file, 5) the output file may be "1crnhadd.pdb", 6) reply 1 (yes) to the all prompts. The output file "1crnhadd.pdb" is compared with the attached sample output, "1crnhadd.pdb.sample". 7) execute "fmoutil" again, 8) select JOB #1, 9) use "1crnhadd.pdb" as the PDB input file, 10) output file may be "1crn1res.inp", 11) reply 8,2,2000 for the first part, 1,1,1,1,2,2,1,1 for the second part, and 1,1,1 for the third part. The output file is compared with the attached sample output, "1crn1res.inp.sample". This file can be used as the input data for the FMO calculation on a PC cluster composed of 8 nodes, each node has dual CPUs and 2GB memory. The output file of the FMO calculation is attached (1crn1res.out.sample). One stroke execution of Jobs #2 and #1 is possible. You reply "2 1" (with a space between the numbers) to the "Enter Job # :" prompt. In this case, only the FMO input data are created (H-added pdb file is not created). 8. Troubleshooting and manual post-editing 1) Memory problems Probably the most frequently occurring problem happens when a job aborts complaining that there is not enough memory. This often occurs in these cases: a) parallel MP2 jobs b) cube files c) DFT/MCSCF jobs The second and third types (b,c) are the easiest to fix. $intgrl nintic=-98000000 $end You want to shift integral memory (nintic) to the general pool. Try subtracting 10 mln words from nintic (or whatever amount seems appropriate): $intgrl nintic=-88000000 $end The first type (MP2) by far is the most troublesome. FMOutil tries to guess how much memory will be needed for MP2 but such a guess is nearly always wrong (it maybe wrong overestimating the amount, in which case performance may suffer or underestimating, in which case the job aborts). Here is what to do if a job aborts. First find the largest MP2 calculation in your job. Most likely, this is the largest SCF dimer (that is, dimer separated within RESDIM threshold). The largest roughly means with the largest number of AOs. If you have absolutely no idea what it is, look at you aborted job and find this line: Max AOs per frg: 150 Then you can guess the dimer size by doubling the number that is printed there. In this case it is 150*2=300. Next you need to guess how much memory you need for this size and how many nodes will hold it in GDDI. First, understand what was in your input. $system mwords=25 memddi=1498 $end $gddi ngroup=16 $end $fmoprp ! Probably you need to provide more nodes to have enough memory ngrfmo(1)=3,1,0,0,0, 0,0,0,0,0 $end This says that: 1498 megawords of DDI memory per 16 nodes (1498/16=94 mwords per node) Then during monomer calculation you asked for 3 groups, that is, 6+5+5 nodes. They will have 6*94, 5*94, 5*94 megawords of DDI memory to handle jobs up to 150 basis functions. If your job died during monomer calculation (find the last "Running RHF energy for monomer/dimer", remember to look into all output files, not just the grand master one), you should increase memory there by decreasing the number of GDDI groups (the first number in ngrfmo). e.g., try ngrfmo(1)=2,1,0,0,0, 0,0,0,0,0 In this case each of two groups will have 8*94 megawords. The same holds for dimers. If your job died during a dimer calculation, try decreasing the number of DDI groups (the second number in ngrfmo). In this case it is already 1, which means you need more nodes! If you have some experience or are an advanced imaginative user who can either look up literature to find the required memory amount based upon the number of AOs (and core orbitals) or run a small ab initio MP2, EXETYP=CHECK calculation with a system of similar size, then you can change ngrfmo before running your larhe FMO-MP2 job and save time. Finally, remember that MP2 gradient takes several times more memory than MP2 energy, so you may content yourself with getting only energy. Also, there are two independent MP2 codes in GAMESS, one is sequential and the other parallel. If your GDDI group has 1 node, then the sequential code is executed, which means all above trouble does not apply to you, since sequential jobs do not use DDI memory. FMOutil, however, will not take advantage of the fact. You may be able to get nice a speed-up by confiscating all memory from DDI (set memddi=0 and add memddi/ngroup to mwords: mwords=25 -> mwords=25+94=119), which you can do if all GDDI groups have one node. In some cases you can lessen memory burden by dividing a run into two. For example, if you intended to produce a cube file for an MP2 calculation, it is possible to run an RHF calculation with a cube file and then an MP2 calculation without it. The cube file for MP2 is produced with RHF density anyway. 2) Basis set problems If you have some "non-organic" atoms in your system, it can happen that GAMESS does not have the all-electron basis set for it that you requested. In this case, locate $data group: $data C1 U.1-1 92 n21 3 c.1-1 6 n21 3 Here you had a uranium atom for which you asked for 3-21G basis set that does not exist. In this case, find an all-electron basis set (not ECP!) you like, and explicitly provide it: $data C1 U.1-1 92 s 26 1 1000000 1 ... c.1-1 6 n21 3 9. Miscellaneous 1) If some error occur with FMOutil, you may be able to avoid the error by pre-editing the input PDB file. 2) A file named "fmoutil.prm" in the current directory will be read and the parameters (covalent radii, vdw radii etc.) given in the file will be used instead of the internally defined values. See a template file "fmoutil.p00" for details. 3) The input data for JOB #1 can be pre-determined in a file named "fmoutil.def". When this file is found in the current directory, the program reads in data from this file and skips the corresponding prompts. See a template file "fmoutil.d00" for details. 4) Latest version of FMOutil will be available at, http://staff.aist.go.jp/d.g.fedorov/ FMO related information is also available from this web site. Acknowledgements This work is partially supported by the NAREGI Nanoscience Project and by Grant-in-Aid for Scientific Research of Ministry of Education, Culture, Sports, Science and Technology, Japan. KK thanks to Prof. Isao Nakanishi (Kyoto University, Graduate School of Pharmacy) for his valuable comment. References [1] Y.Komeiji, PEACH, http://staff.aist.go.jp/y-komeiji/ [2] T.Nakano, ABINIT-MP, http://www.fsis.iis.u-tokyo.ac.jp/en/result /software/ [3] D.G.Fedorov, R.M.Olson, K.Kitaura, M.S.Gordon, S.Koseki, J.Comp.Chem.,25,872-880(2004). Appendix "add-h-type" and an example of input PDB file 1. The "add-h-type" code ------------------------------------------------------------------------------ add-h-type connection of comment code number reference atoms ------------------------------------------------------------------------------ |-atom2 11 H-atom1-atom3 H atom and atom1-4 form peudotetrahedron |-atom4 ex. H-CH3 of methane |-atom2 12 H-atom1 atom1 is sp2 and H and atom1-3 are in a plane |-atom3 ex. H-C of benzene 13 H-atom1-atom2-atom3 H-atom1-atom2-atom3 can be cis or trans ex. H-O-C-C of ethanol 21 H-| |-atom2 two H atoms are added at atom1 atom1 H2 and atom2,3 are twisted by 90 degrees H-| |-atom3 ex. H2 of CH4 22 H-| atom1-atom2-atom3 all atoms are in a plane H-| ex. H2N-C-N(H2)- of arginine 31 H-| H-atom1-atom2-atom3 three H atoms are added at atom1 H-| ex. H3N-C-C- of lysine ------------------------------------------------------------------------------ 2. PDB input data for hydrogen atom addition to non-peptide molecules The following is an example of Bombykol in PDB:1DQE. The structural formula of Bombykol is written as HO-(CH2)9-(CH=CH)2-(CH2)2-CH3. The last non-integer value in the first line is bond length which is optional (usually internally defined standard bond length is applied). REMARK ADDH 13 1 2 3 1 0.96 REMARK ADDH 21 2 1 3 REMARK ADDH 21 3 1 4 REMARK ADDH 21 4 3 5 REMARK ADDH 21 5 4 6 REMARK ADDH 21 6 5 7 REMARK ADDH 21 7 6 8 REMARK ADDH 21 8 7 9 REMARK ADDH 21 9 8 10 REMARK ADDH 21 10 9 11 REMARK ADDH 12 11 10 12 REMARK ADDH 12 12 11 13 REMARK ADDH 12 13 12 14 REMARK ADDH 12 14 13 15 REMARK ADDH 21 15 14 16 REMARK ADDH 21 16 15 17 REMARK ADDH 31 17 16 15 REMARK the following data are partially taken from PDB:1DQE ATOM 2141 O1 BOM A 300 -1.744 31.301 2.886 1.00 0.00 ATOM 2142 C2 BOM A 300 -1.235 32.329 2.020 1.00 0.00 ATOM 2143 C4 BOM A 300 0.295 32.341 2.005 1.00 0.00 ATOM 2144 C7 BOM A 300 0.782 30.905 1.752 1.00 0.00 ATOM 2145 C10 BOM A 300 1.795 30.549 0.677 1.00 0.00 ATOM 2146 C13 BOM A 300 2.684 29.358 1.190 1.00 0.00 ATOM 2147 C16 BOM A 300 2.164 28.022 0.717 1.00 0.00 ATOM 2148 C19 BOM A 300 3.280 26.961 0.709 1.00 0.00 ATOM 2149 C22 BOM A 300 3.069 25.811 1.683 1.00 0.00 ATOM 2150 C25 BOM A 300 4.130 24.716 1.517 1.00 0.00 ATOM 2151 C28 BOM A 300 5.296 24.975 2.447 1.00 0.00 ATOM 2152 C31 BOM A 300 5.620 26.199 2.824 1.00 0.00 ATOM 2153 C33 BOM A 300 6.695 26.586 3.691 1.00 0.00 ATOM 2154 C35 BOM A 300 7.030 27.833 4.015 1.00 0.00 ATOM 2155 C37 BOM A 300 6.532 29.197 3.724 1.00 0.00 ATOM 2156 C39 BOM A 300 7.457 30.292 4.277 1.00 0.00 ATOM 2157 C42 BOM A 300 6.834 31.693 4.189 1.00 0.00 END ------------------------------------------------------------------------ Copyright (C) FMOutil is free software; you can redistribute it and/or modify it under the terms of the GNU General Public License version 2 as published by the Free Software Foundation. This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details. FMOutil is coded by D.G.Fedorov, T.Ishida, K.Kitaura National Institute of Advanced Industrial Science and Technology (AIST) 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan http://staff.aist.go.jp/d.g.fedorov/ --------------------------------------------------------------------------