Scripts:get stats.py

From UC_Chemistry

(Difference between revisions)
Jump to: navigation, search
Revision as of 16:32, 1 September 2009
David Rogers (Talk | contribs)

← Previous diff
Current revision
David Rogers (Talk | contribs)

Line 1: Line 1:
-Note: Make sure kT and h are correct for your particular system!+The main work of this script is to take four different data files and find the set of lambda-s which are contained in all four (hence the set language in match_lines). To do that, it converts all the reported lambda-s to integer bin numbers via division by h -- so all your bin widths must be a multiple of h to work correctly.
 + 
 +Note: Make sure kT is correct for your particular system and h is less than or equal to the one you used!
 + 
 +Output Datafile Columns:
 + 
 +1. lambda (input length units)
 + 
 +2. Upper bound+err -- collected from upper bounds column of neg. dat.
 + 
 +4. Upper Est.+err -- average minus beta times half variance from neg. dat.
 + 
 +6. Lower Est.+err -- average plus beta times half variance from pos. dat.
 + 
 +8. Lower Bound+err. -- collected from lower bounds column of pos. dat.
 + 
 +10. LR Avg. Est.+err -- (<DU|lambda>_0 + <DU|lambda>_1)/2
 + 
 +12. HS+err -- collected from p0.expot, sign reversed and units converted by multiplication with kT
 + 
 +14. IS+err -- collected from x0.expot, units converted by multiplication with kT
 + 
 +16. total+err -- 10+12+14
 + 
 +18. Half the difference between 4. and 6. -- may indicate the validity of the Gaussian LR assumption.
 + 
 +All energies are reported in input units (but only if kT is correct for your system!). All of the dependent variables have one standard deviation error estimates written in the next column.
== Script Source == == Script Source ==
Line 82: Line 108:
main(sys.argv) main(sys.argv)
</nowiki></pre> </nowiki></pre>
 +
 +== Gnuplot Script ==
 +<pre><nowiki>
 +#!/usr/bin/env gnuplot
 +
 +set size 0.7, 0.7
 +#set pointsize 1.5
 +set encoding iso_8859_1
 +
 +set title "Chloride Ion Calculation"
 +set xlabel "{/Symbol l} (\305)"
 +set ylabel "Free Energy (kcal/mol)"
 +
 +set xrange [2.9:3.5]
 +set yrange [-140:20]
 +
 +
 +set key off
 +#set key outside samplen 0.5
 +#set key 3.94, -420 samplen 2.2 spacing 0.67
 +
 +plot -84.6 title "{/Times-Roman=16 Grossfield et. al., 2003}" lt 1
 +replot 'contrib.dat' u 1:($2+$3) w l title "{/Times-Roman=16 LR Bounds}" lt 4
 +replot 'contrib.dat' u 1:($8-$9) w l notitle lt 4
 +replot 'contrib.dat' u 1:($10):11 w errorlines title "{/Times-Roman=16 Mean Field LR}" lt 4 pt 3
 +replot 'contrib.dat' u 1:12:13 w errorlines title "{/Times-Roman=16 Packing}" lt 3 pt 1
 +replot 'contrib.dat' u 1:14:15 w errorlines title "{/Times-Roman=16 Inner Shell}" lt 2 pt 2
 +replot 'contrib.dat' u 1:($16):17 w errorlines title "{/Times-Roman=16 Total}" lt 6 pt 8
 +#replot 'contrib.dat' u 1:16:($17+$18) w errorlines notitle lt 6 pt 8
 +
 +set term postscript enhanced lw 2 mono dashed "Times-Roman" 24
 +set out 'Cl.eps'
 +replot
 +</pre></nowiki>

Current revision

The main work of this script is to take four different data files and find the set of lambda-s which are contained in all four (hence the set language in match_lines). To do that, it converts all the reported lambda-s to integer bin numbers via division by h -- so all your bin widths must be a multiple of h to work correctly.

Note: Make sure kT is correct for your particular system and h is less than or equal to the one you used!

Output Datafile Columns:

1. lambda (input length units)

2. Upper bound+err -- collected from upper bounds column of neg. dat.

4. Upper Est.+err -- average minus beta times half variance from neg. dat.

6. Lower Est.+err -- average plus beta times half variance from pos. dat.

8. Lower Bound+err. -- collected from lower bounds column of pos. dat.

10. LR Avg. Est.+err -- (<DU|lambda>_0 + <DU|lambda>_1)/2

12. HS+err -- collected from p0.expot, sign reversed and units converted by multiplication with kT

14. IS+err -- collected from x0.expot, units converted by multiplication with kT

16. total+err -- 10+12+14

18. Half the difference between 4. and 6. -- may indicate the validity of the Gaussian LR assumption.

All energies are reported in input units (but only if kT is correct for your system!). All of the dependent variables have one standard deviation error estimates written in the next column.

Script Source

#!/usr/bin/python

import sys
from ucgrad import *

kT = Boltzmann*298.15/4.184
h = 0.01

def main(argv):
        if len(argv) < 6:
                print "Usage: %s <p0.expot> <x0.expot> <neg dat> <pos dat> <out>"%(argv[0])
                return 1

        # lambda, beta mu, beta err
        p0 = read_matrix(argv[1])
        x0 = read_matrix(argv[2])

        # lambda, S, mu_DU, var_DU, bound, err, nb
        du0 = read_matrix(argv[3])
        du1 = read_matrix(argv[4])

        p0, x0, du0, du1 = match_lines(p0, x0, du0, du1)
        if len(p0) < 1:
                print "Unable to find any matching radii!"
                return -1

        # lambda, Upper bound+err, Upper Est.+err, Lower Est.+err, Lower Bound+err., LR Avg. Est.+err, HS+err, IS+err, total+err, err2
        M = zeros((len(p0),18))
        M[:,0] = p0[:,0]
        M[:,1:3] = du0[:,4:6]
        M[:,3] = du0[:,2]-0.5*du0[:,3]/kT
        M[:,4] = sqrt( (du0[:,3]+0.5*(du0[:,3]/kT)**2)/(du0[:,1]-1.0) )
        M[:,5] = du1[:,2]+0.5*du1[:,3]/kT
        M[:,6] = sqrt( (du1[:,3]+0.5*(du1[:,3]/kT)**2)/(du1[:,1]-1.0) )
        M[:,7:9] = du1[:,4:6]
        M[:,9] = 0.5*(du0[:,2]+du1[:,2])
        M[:,10] = 0.5*sqrt(du0[:,3]/(du0[:,1]-1.0)+du1[:,3]/(du1[:,1]-1.0))
        M[:,11:13] = p0[:,1:3]*kT
        M[:,11] *= -1.0 # should be pos., not neg.
        M[:,13:15] = x0[:,1:3]*kT
        M[:,15] = M[:,9]+M[:,11]+M[:,13]
        M[:,16] = sqrt(M[:,10]**2+M[:,12]**2+M[:,14]**2)
        M[:,17] = 0.5*abs(M[:,5]-M[:,3]) # Approx. to Gaussian approx. error
                        # being the difference between the 2-moment est.s.
        write_matrix(argv[5], M)

        return 0

def match_lines(*tbl):
        # round to int rep. using pre-set h
        rlist = []
        for t in tbl:
                rlist.append((t[:,0]/h+0.5).astype(int).tolist())
        inrad = set(rlist[0])
        for r in rlist[1:]:
                inrad = inrad & set(r)

        clip = []
        for rl, t in zip(rlist, tbl):
                t = t.tolist()
                for i in range(len(t)-1, -1, -1): # Check names with "in" list
                        if rl[i] not in inrad: # and be a good bouncer.
                                del t[i]
                sort(t, 0) # Numerically sort. 'em.
                clip.append(array(t))

        return tuple(clip)

def pfee(x):
        m, v, me, ve = stats(exp(x/kT))
        return kT*log(m), kT*me/m

def nfee(x):
        m, v, me, ve = stats(exp(-x/kT))
        return -kT*log(m), kT*me/m

if __name__=="__main__":
        main(sys.argv)

Gnuplot Script

<nowiki>
#!/usr/bin/env gnuplot

set size 0.7, 0.7
#set pointsize 1.5
set encoding iso_8859_1

set title "Chloride Ion Calculation"
set xlabel "{/Symbol l} (\305)"
set ylabel "Free Energy (kcal/mol)"

set xrange [2.9:3.5]
set yrange [-140:20]


set key off
#set key outside samplen 0.5
#set key 3.94, -420 samplen 2.2 spacing 0.67

plot -84.6 title "{/Times-Roman=16 Grossfield et. al., 2003}" lt 1
replot 'contrib.dat' u 1:($2+$3) w l title "{/Times-Roman=16 LR Bounds}" lt 4
replot 'contrib.dat' u 1:($8-$9) w l notitle lt 4
replot 'contrib.dat' u 1:($10):11 w errorlines title "{/Times-Roman=16 Mean Field LR}" lt 4 pt 3
replot 'contrib.dat' u 1:12:13 w errorlines title "{/Times-Roman=16 Packing}" lt 3 pt 1
replot 'contrib.dat' u 1:14:15 w errorlines title "{/Times-Roman=16 Inner Shell}" lt 2 pt 2
replot 'contrib.dat' u 1:($16):17 w errorlines title "{/Times-Roman=16 Total}" lt 6 pt 8
#replot 'contrib.dat' u 1:16:($17+$18) w errorlines notitle lt 6 pt 8

set term postscript enhanced lw 2 mono dashed "Times-Roman" 24
set out 'Cl.eps'
replot
</nowiki>
Personal tools