-------------------------------------------
June 24, 2005
           FMOutil version 2.0
        Copyright (C) 2004-05 AIST
 coded by D.G.Fedorov, T.Ishida, K.Kitaura
           AIST, Tsukuba, Japan 
   http://staff.aist.go.jp/d.g.fedorov/
-------------------------------------------

1. About FMOutil 
   FMOutil is a program to generate input data for FMO calculations with
   GAMESS. The functions of the program are,
     1) generate input data for FMO calculations with GAMESS,
     2) add H atoms to PDB data, 
     3) report geometrical parameters (bond lengths, bond angles, and 
        hydrogen bonds,
     4) report residues which are within a threshold distance from
        specified residue(s)/molecule(s).

   For the calculations of proteins/polypeptides, you will start
   with PDB data. The PDB data usually do not include hydrogen atoms.
   This program has a function to add hydrogen atoms. The quality of
   generated structure, however, may not be high enough because their positon
   is not optimized. We recommend optimisation of H atom positions using
   some modeling software and it may be better to add H atoms with the
   modeling software as well. However, not all modeling software is capable 
   of adding H atoms physically reasonable. HyperChem 7, for example, attaches
   the HG hydrogen atom of Leu badly. One can use TINKER program package 
   distributed together with GAMESS to prepare better geometries of 
   bio-molecules by optimizing the positions of hydrogen atoms with force 
   filed methods after the addition of the atoms. Whatever way you follow
   to prepare the initial structure, use some graphics software to look
   at it before commencing FMO calculations! Five minutes of careful looking
   can save weeks of wasted CPU time.
   The current version of FMOutil can handle proteins/polypeptides and their
   complexes with non-peptide molecules including water molecules. 
   For the FMO calculations of DNA/RNA, you can use PEACH [1], a MD program 
   developed by Komeiji, which has a function to generate FMO input data 
   for GAMESS. This program also can generate input data for FMO calculations 
   with ABINIT-MP developed by Nakano [2].

2. Installation
   This program is simple enough to compile and link at a stroke. So, no 
   installation tool is attached. The file, fmoutil.f (about 12,000 lines), 
   includes all fortran source codes. With the f77 compiler on linux on 
   personal computers, the compile and link will be completed by the 
   following one command,
          f77 -o fmoutil fmoutil.f 
   The program was not tested on other machines/OS's. If you encounter
   errors please report them to us along with your hardware/OS information.
   The parameter statement, Parameter (MaxAtm=20000,MaxRes=1000,MaxMol=10,
   MaxFrg=1000), determines the maximum number of atoms, residues, molecules, 
   and fragments, respectively. The users will modify this statement for 
   larger molecules.

3. How to use the program 
   The program is designed to use interactively and minimum inputs are 
   required. The users will easily understand how to use this program. 
   First, the program prompts,
     Enter Job # : 
       1: Generate FMO input data for GAMESS 
       2: Add H atoms to PDB data 
       3: Report Geometrical Parameters 
       4: Report Distances of Residues
       0 or Enter Key. Quit

   All Jobs require input and output files. Enter file names according to 
   the following prompts, 
       Enter Input PDB File(s) :
       Enter Output file : 
   The former prompt accepts up to five file names separated by spaces.

   Job #1 reads PDB format data which should contain a complete coordinates 
   of atoms including hydrogen atoms and writes out generated FMO input data
   on the output file. Job #2 reads PDB data which do not have hydrogen 
   atoms and writes out the PDB format data with supplemented hydrogen atoms
   on the output file. JOB #3 and #4 read PDB format data and write out the
   geometrical information which is useful to see details of molecular 
   geometries on the output file.

4. FMO input data generation (JOB #1)
   The prompts of JOB #1 are grouped into three categories. The first group 
   is related to your computers, the second group to FMO calculations and 
   the last group to partition of molecules.

  1) Computer system

     1> How many computer nodes do you want to use ?
     1> How many CPUs does each node have ?
     1> How much memory does each node have (in megabytes) ?

   FMO calculations use the generalized DDI (GDDI) [3] for efficient
   parallel processing. GDDI requires input data for grouping of the available
   CPUs (the $gddi name group).  The information input here is also used
   to make the nintic data in $intgrl, the ngrfmo(1) data in $fmoprp, etc.,
   along with the $gddi group data.

  2) FMO calculations

     1> Choose runtyp (1:energy, 2:gradient, 3:geometry optimization)

   For geometry optimizations, the esp and the separated dimer approximations
   will be set more strict than these for single point energy calculations;
   1.0->2.0 for respap, 2.0->2.5 for resppc and 2.0->2.5 for resdim.
   By default the GAMESS standard optimizer is selected for molecules with 
   less than 2000 atoms. The optimization method for larger molecules should 
   be selected at the following prompt.

     1> Choose geometry optimization method
        (1:conjugate gradient, 2:simple Hessian update, 3:GMS standard 
         optimizer)

     1> How many layers do you want to define (1-5) ?
     1> Enter fragment numbers to be assigned to layer (2-nlayer).
     1> Choose wavefunction type (1:RHF, 2:DFT, 3:MP2, 4:CC, 5:MCSCFF)
        for each layer. separated by blanks.

   For DFT, B3LYP is specified by default. If you want to use other
   exchange-correlation functional, you have to modify the output file 
   by hand. Similarly for the CC method, CCSD(T) is spelled out by default.
   You can change it to the desired one by manual post-editing. Do not
   choose non size extensive renormalized CC methods except for multilayer
   runs with 1 fragment in the CC layer.
   When MCSCF is selected, the following prompts will appear,

     1> Enter the MCSCF fragment number.
     1> Enter the spin multiplicity of the MCSCF fragment.
     1> Enter the number of core orbitals, the number of active orbitals
        and number of active electrons.

   For MCSCF, the initial orbitals should be provided before the execution. 
   See the GAMESS manual, INPUT.DOC.

     1> Enter basis set (1:STO-3G, 2:3-21G, 3:6-31G, 4:6-31G*, 5:6-311G*)
        for each layer. separated by blanks.

   The STO-3G basis set for the 2nd row elements in GAMESS is different 
   from the original one. The 6-31G basis set for Si is also different. 
   (see the GAMESS manual, REFS.DOC) FMOutil will ask to choose either one.

     1> Do you add diffuse functions on COO- groups ? (1:yes, 2:no)

   We recommend to add diffuse functions on the COO- group atoms of GLU,
   ASP and C-terminus.

     1> Enter the desired n-body expansion (2 or 3) ?

   The 3-body FMO method (FMO3) is available only with the RHF, DFT and CC 
   wavefunctions.

     1> Would you like to print Mulliken charges (1:yes, 2:no)
   
   Mulliken populations will not be printed when the 3-body FMO is selected.
   
     1> Would you like to produce a cube file with the total electron 
        density ? (1:no, 2:standard, 3:sparse)
     1> Enter grid spacing in Angstrom.

   The cube file option become void in geometry optimization jobs.

  3) Partition of molecules

     1> Enter fragment size (1:1res-per-frg, 2:2res-per-frg)
     1> are S-S bonded CYSs combined to one ? (1:yes, 2:no)
     1> is GLY combined to the neighbor ? (1:yes, 2:no)

   We recommend to reply 2,1, and 1 for a production run.

   It is assumed that the "TER" keyword separates molecules; "molecule" 
   means no bond between them. FMOutil can split proteins/polypeptides but 
   can not split non-peptide molecules which are left as one fragment. 
   If need arise, the users should fractionate non-peptide molecules by hand.

5. Addition of hydrogen atoms (JOB #2)

  1) Polypeptides/proteins
   JOB #2 adds hydrogen atoms to polypeptides/proteins in PDB input file. 
   In this job, the following prompts will appear.  

     2> Protonate N-terminus (1:yes, 2:no)
     2> Protonate C-terminus (1:no, 2:yes) 
     2> Protonate ASP and GLU (1:no, 2:yes) 
     2> Protonate HIS (1:yes, 2:no)

   If 2(no) is entered for the above prompt, one extra prompt will appear,

     2> Neutral HIS ... H at D1 or E2 (1:D1, 2:E2)

  2) Water molecules
   Hydrogen atoms are added to water molecules automatically.

  3) Non-peptide molecules
   For non-peptide molecules, it is not possible to determine positions 
   where hydrogen atoms are attached based on only the chemical formula 
   given in PDB files. You have to know the structural formula of the 
   molecule which may be found in the original paper cited in the 
   PDB file. Once you see where to add hydrogen atoms, you can use fmoutil
   for the purpose by adding extra data to the input PDB file.
    
   The data format is as follows,

    REMARK ADDH  add-h-type atom1 atom2 atom3 [cis(0) or trans(1)] 
                 [H-X bond length]

   The parameters in [ ] are optional and the "cis or trans" option is 
   defined only for "add-h-type" 13. See Appendix for the definition of
   "add-h-type" and an example.

   If your system has non-peptide molecules as well as peptide molecules,
   it may be easy to process to separate the PDB file into a peptide file
   and non-peptide molecule files. These files (maximum five files) will 
   be used as the input PDB file(s).

6. Report on molecular geometry (JOB #3 and #4)

  1) Bond lengths, bond angles and hydrogen bonds
   JOB #3 writes out geometrical parameters on the output file. 
   For this job, the following prompts will appear.  

     3> Bond Lengths, Bond Angles, H-bonds (1:yes, 2:no)
          ex. 111 for all, 221 for H-bond only

   If you need all of these geometrical parameters, enter "111" (without 
   spaces between the numbers). "222" does nothing.

  2) Residue distance
   JOB #4 reports residue distances form specified residue(s)/molecule(s). 
   This report may be useful to find out which residues are close to
   specific residue(s)/molecule(s), for instance, the distances of residues
   from a ligand in a protein-ligand complex. For the JOB, the following 
   prompts will appear.

     4> Enter Res/Mol numbers from which Distance are measured 
            ex. 2 3 - 5 8 10 

   This example means distance between residues from the 2 3 4 5 8 and 10 
   residues are measured. The distance is given as that of closest contact 
   atoms of residues. Another prompt looks like,

     4> Enter Threshold Distance ex. 2.5 (vdw unit)
          or 5.0 A (Angstrom unit) 

   The residues within this threshold distance are printed. The value can 
   be given either in vdw unit (relative to a sum of the vdw radii of the 
   closest contact atoms) or in Angstrom unit.

7. Sample Output 
   Sample output files derived from PDB code "1crn" are attached. 
   (The "1crn.pdb" file is not attached.) To reproduce these outputs,

   1) download "1crn" from a PDB site and save as "1crn.pdb", 
   2) execute "fmoutil",
   3) select JOB #2,
   4) use "1crn.pdb" as the PDB input file,
   5) the output file may be "1crnhadd.pdb",
   6) reply 1 (yes) to the all prompts.

   The output file "1crnhadd.pdb" is compared with the attached sample 
   output, "1crnhadd.pdb.sample".

   7) execute "fmoutil" again,
   8) select JOB #1,
   9) use "1crnhadd.pdb" as the PDB input file,
   10) output file may be "1crn1res.inp",
   11) reply 8,2,2000 for the first part, 1,1,1,1,2,2,1,1 for the second
       part, and 1,1,1 for the third part.

   The output file is compared with the attached sample output, 
   "1crn1res.inp.sample". This file can be used as the input data for 
   the FMO calculation on a PC cluster composed of 8 nodes, each node 
   has dual CPUs and 2GB memory. The output file of the FMO calculation
   is attached (1crn1res.out.sample).

   One stroke execution of Jobs #2 and #1 is possible. You reply "2 1"
   (with a space between the numbers) to the "Enter Job # :" prompt.
   In this case, only the FMO input data are created (H-added pdb file
   is not created).


8. Troubleshooting and manual post-editing

  1) Memory problems
   Probably the most frequently occurring problem happens when a job aborts
   complaining that there is not enough memory.
   This often occurs in these cases:
   a) parallel MP2 jobs
   b) cube files
   c) DFT/MCSCF jobs
   The second and third types (b,c) are the easiest to fix. 
    $intgrl nintic=-98000000 $end
   You want to shift integral memory (nintic) to the general pool. 
   Try subtracting 10 mln words from nintic (or whatever amount seems 
   appropriate):
    $intgrl nintic=-88000000 $end

   The first type (MP2) by far is the most troublesome. FMOutil tries to guess 
   how much memory will be needed for MP2 but such a guess is nearly always 
   wrong (it maybe wrong overestimating the amount, in which case performance 
   may suffer or underestimating, in which case the job aborts).
   Here is what to do if a job aborts. First find the largest MP2 calculation
   in your job. Most likely, this is the largest SCF dimer (that is, dimer
   separated within RESDIM threshold). The largest roughly means with the 
   largest number of AOs. If you have absolutely no idea what it is, look at 
   you aborted job and find this line:
    Max AOs per frg:       150
   Then you can guess the dimer size by doubling the number that is printed 
   there. In this case it is 150*2=300.
   Next you need to guess how much memory you need for this size and how many 
   nodes will hold it in GDDI.
   First, understand what was in your input.
   
    $system mwords=25 memddi=1498 $end
    $gddi ngroup=16 $end
    $fmoprp
   ! Probably you need to provide more nodes to have enough memory
      ngrfmo(1)=3,1,0,0,0,  0,0,0,0,0
    $end

   This says that:
   1498 megawords of DDI memory per 16 nodes (1498/16=94 mwords per node)
   Then during monomer calculation you asked for 3 groups, that is,
   6+5+5 nodes. They will have 6*94, 5*94, 5*94 megawords of DDI memory to
   handle jobs up to 150 basis functions. If your job died during monomer 
   calculation (find the last "Running RHF      energy for monomer/dimer",
   remember to look into all output files, not just the grand master one), 
   you should increase memory there by decreasing the number of GDDI groups
   (the first number in ngrfmo). e.g., try
      ngrfmo(1)=2,1,0,0,0,  0,0,0,0,0
   In this case each of two groups will have 8*94 megawords.
   The same holds for dimers. If your job died during a dimer calculation,
   try decreasing the number of DDI groups (the second number in ngrfmo). 
   In this case it is already 1, which means you need more nodes!

   If you have some experience or are an advanced imaginative user who can 
   either look up literature to find the required memory amount based upon 
   the number of AOs (and core orbitals) or run a small ab initio MP2, 
   EXETYP=CHECK calculation with a system of similar size, then you can 
   change ngrfmo before running your larhe FMO-MP2 job and save time.
   Finally, remember that MP2 gradient takes several times more memory than 
   MP2 energy, so you may content yourself with getting only energy.
   Also, there are two independent MP2 codes in GAMESS, one is sequential and 
   the other parallel. If your GDDI group has 1 node, then the sequential 
   code is executed, which means all above trouble does not apply to you, 
   since sequential jobs do not use DDI memory. FMOutil, however, will not 
   take advantage of the fact. You may be able to get nice a speed-up by 
   confiscating all memory from DDI (set memddi=0 and add memddi/ngroup to 
   mwords: mwords=25 -> mwords=25+94=119), which you can do if all GDDI groups 
   have one node. 
   
   In some cases you can lessen memory burden by dividing a run into two.
   For example, if you intended to produce a cube file for an MP2 calculation,
   it is possible to run an RHF calculation with a cube file and then an MP2
   calculation without it. The cube file for MP2 is produced with RHF density 
   anyway.

  2) Basis set problems
   If you have some "non-organic" atoms in your system, it can happen that 
   GAMESS does not have the all-electron basis set for it that you requested.
   In this case, locate $data group:

    $data
    <Enter your job title here>
    C1 
    U.1-1    92  
          n21 3 
       
    c.1-1    6  
          n21 3 

   Here you had a uranium atom for which you asked for 3-21G basis set that 
   does not exist. In this case, find an all-electron basis set (not ECP!) 
   you like, and explicitly provide it:

    $data
    <Enter your job title here>
    C1 
    U.1-1    92  
    s 26
    1 1000000 1
    ...
      
    c.1-1    6  
          n21 3 


9. Miscellaneous

  1) If some error occur with FMOutil, you may be able
   to avoid the error by pre-editing the input PDB file.
  
  2) A file named "fmoutil.prm" in the current directory will be read
   and the parameters (covalent radii, vdw radii etc.) given in the 
   file will be used instead of the internally defined values. See
   a template file "fmoutil.p00" for details.

  3) The input data for JOB #1 can be pre-determined in a file named 
   "fmoutil.def". When this file is found in the current directory, the
   program reads in data from this file and skips the corresponding prompts.
   See a template file "fmoutil.d00" for details.

  4) Latest version of FMOutil will be available at,
      http://staff.aist.go.jp/d.g.fedorov/
   FMO related information is also available from this web site.


Acknowledgements 
   This work is partially supported by the NAREGI Nanoscience Project and 
   by Grant-in-Aid for Scientific Research of Ministry of Education,
   Culture, Sports, Science and Technology, Japan. KK thanks to Prof. Isao
   Nakanishi (Kyoto University, Graduate School of Pharmacy) for his
   valuable comment.


References
   [1] Y.Komeiji, PEACH, http://staff.aist.go.jp/y-komeiji/ 
   [2] T.Nakano, ABINIT-MP, http://www.fsis.iis.u-tokyo.ac.jp/en/result
       /software/
   [3] D.G.Fedorov, R.M.Olson, K.Kitaura, M.S.Gordon, S.Koseki,
       J.Comp.Chem.,25,872-880(2004).


Appendix   "add-h-type" and an example of input PDB file

  1. The "add-h-type" code
------------------------------------------------------------------------------
 add-h-type     connection of                  comment
 code number    reference atoms
------------------------------------------------------------------------------
                    |-atom2
    11         H-atom1-atom3     H atom and atom1-4 form peudotetrahedron
                    |-atom4      ex. H-CH3 of methane
                   
                    |-atom2
    12         H-atom1           atom1 is sp2 and H and atom1-3 are in a plane
                    |-atom3      ex. H-C of benzene

    13         H-atom1-atom2-atom3
                                 H-atom1-atom2-atom3 can be cis or trans
                                 ex. H-O-C-C of ethanol

    21         H-|  |-atom2      two H atoms are added at atom1
                 atom1           H2 and atom2,3 are twisted by 90 degrees
               H-|  |-atom3      ex. H2 of CH4
                                 
    22         H-|
                 atom1-atom2-atom3  all atoms are in a plane
               H-|                  ex. H2N-C-N(H2)- of arginine

    31         H-|
               H-atom1-atom2-atom3  three H atoms are added at atom1
               H-|                  ex. H3N-C-C- of lysine
------------------------------------------------------------------------------


  2. PDB input data for hydrogen atom addition to non-peptide molecules
   The following is an example of Bombykol in PDB:1DQE. The structural 
   formula of Bombykol is written as HO-(CH2)9-(CH=CH)2-(CH2)2-CH3.
   The last non-integer value in the first line is bond length which is
   optional (usually internally defined standard bond length is applied).

REMARK ADDH  13     1  2 3 1 0.96
REMARK ADDH  21     2  1 3
REMARK ADDH  21     3  1 4
REMARK ADDH  21     4  3 5
REMARK ADDH  21     5  4 6
REMARK ADDH  21     6  5 7
REMARK ADDH  21     7  6 8
REMARK ADDH  21     8  7 9
REMARK ADDH  21     9  8 10
REMARK ADDH  21    10  9 11
REMARK ADDH  12    11 10 12
REMARK ADDH  12    12 11 13
REMARK ADDH  12    13 12 14
REMARK ADDH  12    14 13 15
REMARK ADDH  21    15 14 16
REMARK ADDH  21    16 15 17
REMARK ADDH  31    17 16 15
REMARK  the following data are partially taken from PDB:1DQE
ATOM   2141  O1  BOM A 300      -1.744  31.301   2.886  1.00  0.00
ATOM   2142  C2  BOM A 300      -1.235  32.329   2.020  1.00  0.00
ATOM   2143  C4  BOM A 300       0.295  32.341   2.005  1.00  0.00
ATOM   2144  C7  BOM A 300       0.782  30.905   1.752  1.00  0.00
ATOM   2145  C10 BOM A 300       1.795  30.549   0.677  1.00  0.00
ATOM   2146  C13 BOM A 300       2.684  29.358   1.190  1.00  0.00
ATOM   2147  C16 BOM A 300       2.164  28.022   0.717  1.00  0.00
ATOM   2148  C19 BOM A 300       3.280  26.961   0.709  1.00  0.00
ATOM   2149  C22 BOM A 300       3.069  25.811   1.683  1.00  0.00
ATOM   2150  C25 BOM A 300       4.130  24.716   1.517  1.00  0.00
ATOM   2151  C28 BOM A 300       5.296  24.975   2.447  1.00  0.00
ATOM   2152  C31 BOM A 300       5.620  26.199   2.824  1.00  0.00
ATOM   2153  C33 BOM A 300       6.695  26.586   3.691  1.00  0.00
ATOM   2154  C35 BOM A 300       7.030  27.833   4.015  1.00  0.00
ATOM   2155  C37 BOM A 300       6.532  29.197   3.724  1.00  0.00
ATOM   2156  C39 BOM A 300       7.457  30.292   4.277  1.00  0.00
ATOM   2157  C42 BOM A 300       6.834  31.693   4.189  1.00  0.00
END


------------------------------------------------------------------------
    Copyright (C)

    FMOutil is free software; you can redistribute it and/or modify
    it under the terms of the GNU General Public License version 2
    as published by the Free Software Foundation.

    This program is distributed in the hope that it will
    be useful, but WITHOUT ANY WARRANTY; without even the implied warranty
    of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE.  See the GNU
    General Public License for more details.

    FMOutil is coded by D.G.Fedorov, T.Ishida, K.Kitaura
    National Institute of Advanced Industrial Science and Technology
    (AIST) 1-1-1 Umezono, Tsukuba, Ibaraki 305-8568, Japan
    http://staff.aist.go.jp/d.g.fedorov/
--------------------------------------------------------------------------