NAMD Trajectory Conversion

From UC_Chemistry

(Difference between revisions)
Jump to: navigation, search
Revision as of 04:04, 20 July 2008
Songhk (Talk | contribs)

← Previous diff
Current revision
David Rogers (Talk | contribs)
(added VMD section)
Line 9: Line 9:
* pdb to dcd conversion * pdb to dcd conversion
catdcd -o tip3p.dcd -pdb tip3p.pdb catdcd -o tip3p.dcd -pdb tip3p.pdb
 +
 +== Manually Using VMD ==
 +If you're not tied to the command line, you can also use VMD to do this by loading and then writing the molecule. Of course, you also have to load the psf for your dcd to write a pdb. This came in handy for some of my latest projects converting Amber NetCDF files to gromacs trr files. These trr files are handy for QCT analysis, since '''g_mindist''' just needs a trr and an index file (no tpr necessary) to output r_min for each frame.
 +Now, if you didn't save the box info, you can type the following into the vmd console before writing your file to get that right (replacing 150x150x98 with your own dimensions).
 +<pre>
 +set sel [atomselect top all]
 +set ns 0
 +while { $ns < [molinfo top get numframes] } {
 + molinfo top set frame $ns
 + molinfo top set { a b c } { 150.0 150.0 98.0 }
 + molinfo top set { alpha beta gamma } { 90.0 90.0 90.0 }
 + incr ns
 + }
 +</pre>
== Direct NAMD Binary == == Direct NAMD Binary ==
Line 36: Line 50:
__EOF __EOF
</pre> </pre>
 +
 +You can even avoid the '''od''' maneuver if you are using python:
 +<pre>
 +from numpy import *
 +f = open("$i.bout", 'rb+')
 +f.seek(4)
 +x = fromfile(f, float64, 3*62, '')
 +x = reshape(x, (-1,3))
 +print sum(x,0)/len(x)
 +
 +f.seek(4+8*3*61)
 +x[371].tofile(f, '')
 +</pre>
 +Where I have illustrated changing the coordinate of the last atom on the fly for possible subsequent analysis.
 +
== Fixing atom name spacing == == Fixing atom name spacing ==

Current revision

I googled the web for NAMD conversion of dcd to pdb, since all my in-house analysis programs read pdbs and the dcd file format is a too bit cryptic and hacky for me to invest a lot of time writing an i/o interface for. The results were less-than-spectacular, since the major contribution assumes you are already familiar running VMD scripts -- in which case you wouldn't really need help doing this would you? After looking through some crazy tcl scripts plugging into vmd / namd as an interpreter, I realized there was a program called catdcd to get this done.

Enlightened process:

  • download and run catdcd, a program from the makers of NAMD designed to solve these sorts of problems.
  • My command was as follows, but your mileage may vary.
./catdcd -o traj.pdb -otype pdb -first 5000 -last 10000 -stride 1000 -s protein.pdb traj.dcd
  • pdb to dcd conversion
catdcd -o tip3p.dcd -pdb tip3p.pdb 

Contents

Manually Using VMD

If you're not tied to the command line, you can also use VMD to do this by loading and then writing the molecule. Of course, you also have to load the psf for your dcd to write a pdb. This came in handy for some of my latest projects converting Amber NetCDF files to gromacs trr files. These trr files are handy for QCT analysis, since g_mindist just needs a trr and an index file (no tpr necessary) to output r_min for each frame. Now, if you didn't save the box info, you can type the following into the vmd console before writing your file to get that right (replacing 150x150x98 with your own dimensions).

set sel [atomselect top all]
set ns 0
while { $ns < [molinfo top get numframes] } {
  molinfo top set frame $ns
  molinfo top set { a b c } { 150.0 150.0 98.0 }
  molinfo top set { alpha beta gamma } { 90.0 90.0 90.0 }
  incr ns
  }

Direct NAMD Binary

Some of my recent analyses have used the binary trajectory directly. This was a simple matter of

1. Converting a frame from the .dcd file into a NAMD binary format

(I still haven't found an easy way to access dcd-s directly)
i=10
catdcd -o $i.bcoor -first $i -last $i -otype namdbin traj.dcd
od -N4 -td4 $i.bcoor
od -w24 -j4 -tf8 $i.bcoor | head
The od command reveals the true form of the bcoor file -- a 4-byte int giving the number of atoms (skipped with -j4) followed by a list of 8-byte real numbers.

2. Scam the binary 8-byte floats into their own file.

dd if=$i.bcoor of=$i.bout bs=4 skip=1 count=372
Here I have extracted only atom numbers 1-62 by skipping the first block (4-byte int) and outputting only 372=62*3 coord.s * 2 blocks/coord.

3. Read using your favorite program.

python <<__EOF
from numpy import *
x = fromfile("$i.bout")
x = reshape(x, (-1,3))
print sum(x,0)/len(x)
__EOF

You can even avoid the od maneuver if you are using python:

from numpy import *
f = open("$i.bout", 'rb+')
f.seek(4)
x = fromfile(f, float64, 3*62, '')
x = reshape(x, (-1,3))
print sum(x,0)/len(x)

f.seek(4+8*3*61)
x[371].tofile(f, '')

Where I have illustrated changing the coordinate of the last atom on the fly for possible subsequent analysis.


Fixing atom name spacing

  • This program does not maintain the proper spacing for the atom names, which can mess up some downstream processing (my own python pdb reader in particular). Fix using the following sed script, fixnames.sed:
s/    \([N,C,O]\) /  \1   /;
s/   \([H,C,N,S]\)\([1-4,A-Z]\) /  \1\2  /;
  • invocation: sed -f fixnames.sed traj.pdb >traj_out.pdb
    • Make sure you choose a different output name, since sed will delete your output file before it begins processing!

Fixing the MODEL tags and atom name spacing

Here is another script you can use if you want to add the MODEL tags back in to the file.

  • fixnames2.sed:
s/    \([N,C,O]\) /  \1   /;
s/   \([H,C,N,S,O]\)\([1-4,A-Z]\) /  \1\2  /;
s/END/END\
ENDMDL\
MODEL/;
  • invocation:
    echo MODEL >traj_out.pdb; sed -f fixnames2.sed traj.pdb | sed -e '$d' >>traj_out.pdb
  • note that pymol only requires ENDMDL tags be present to read multi-conformer pdbs (so you can just switch END -> ENDMDL if you are lazy)

Deleting Stuff From the PDB

Your PDB file may be too large to process with programs written by mere mortals. Try some/all of the following to make the file smaller.

Deleting unwanted residues

Since the residue names are listed for each line in the file, it is enough to run a simple grep -v on the pdb to get rid of the annoying residues.

grep -v ' POPC \| HOH \| SOL \| TIP3 ' bigger.pdb >smaller.pdb

Deleting nonpolar hydrogen atoms

Deleting all lines matching a pattern is easily accomplished with grep -v

  • Here is the list of nonpolar hydrogens in my system (20 amino acids with CHARMM22 atom names) skip_lines:
HB[1-3] *ALA
HB[1-2] *CYS
H[B,G,E][1-3] *MET
HB[1-2] *ASP
HB[1-2] *ASN
H[B,G,D][1-2] *ARG
H[B,G,D][1-2] *PRO
H[B,G][1-2] *GLU
HB[1-2] *GLN
HG[1-2] *GLN
HG[1-2][1-3] *VAL
HB *VAL
HB[1-2] *TRP
H[E,H,Z][1-3] *TRP
HB[1-2] *LEU
HD[1-2][1-3] *LEU
HG *LEU
HG[1-2][1-3] *ILE
HD[1-3] *ILE
HB *ILE
H[B,G,D,E][1-2] *LYS
H[B,D,E][1-2] *PHE
HZ *PHE
H[B,D,E][1-2] *TYR
HG2[1-3] *THR
HB *THR
HB[1-2] *SER
HB[1-2] *HIS
HA *ASP
HA *CYS
HA *ALA
HA *MET
HA *ASN
HA *ARG
HA *PRO
HA *GLU
HA *GLN
HA *VAL
HA *TRP
HA *LEU
HA *ILE
HA *LYS
HA *PHE
HA *THR
HA *SER
HA[1-2] *GLY
HA *TYR
HA *HIS
  • invoke with grep -v -f skip_names traj_out.pdb >traj_ua.pdb
  • be aware of any issues with further processing, because the standard atom masses are now incorrect
  • as an alternative, you also could use pdb2gmx with a ua gromos forcefield to accomplish the above
Personal tools