Materials Modeling at the Molecular Level

Lecture 2: Efficiency and Error Estimation

1. Review of Lecture 1 and Lab 1

2. More efficient neighbor finding

3. Estimating errors

Laura Frink

Sandia National Laboratories

Computational Materials Science Dept.

lfrink@cs.sandia.gov

Outline

lecture 1: An introduction to molecular modeling

lab 1: Getting to know MD and MC Codes

lecture 2: How to write efficient MC and MD codes

lab 2: Implementation of routines for improved efficiency and error estimation

lecture 3: What to do with an MD or MC Code ????

lab 3: Computing materials properties

Last Time

1. Overview of Materials Modeling in general;

and Molecular simulation in particular.

2. Introduction to Monte Carlo and Molecular Dynamics algorithms.

3. Got a feel for the components of MD and MC codes.

4. Computed pressures along two isotherms

(one sub-critical, one super-critical)

5. Found that the programs run slowly ... especially the MC code.

This Time

1. Improving code performance

2. Estimating errors

Force (MD) /Energy (MC) Computation

90% of the Effort

Simplest Approach

Loop over all particles in simulation volume

MD -- O(N2) -- have moved all N particles at once.

MC -- O(N) -- have only moved one particle at a time.

loop over i (1,N)

loop over j (1,N)[

if (i¹ j) calculate f_ij

]

loop over j (1,N)[

if (i¹ j) calculate u_ij

]

Better Approach: a factor of 2 !!

MD: Apply Newton’s 3rd Law :

for every action there is an equal and opposite reaction. Or f_ij =- f_ji.

loop over i (1,N-1)

loop over j (i+1,N)[

calculate f_ij

set f_ji =-f_ij

]

MC: ???

Best Approaches:

Reduce the size of the loop over particles by keeping track of particles that are within the cut-off distance of interest.

Neighbor lists

Linked Cells

Hybrid Approach - construct neighbor list from linked cells !!

Goal: reduce scaling to O(N) for MD and O(1) for MC
the different methods will yield different prefactors, c, with time = cN

Verlet Neighbor Lists

r_c = cutoff distance

d = ‘skin’ thickness

r_n = r_c + d = neighbor cutoff distance

every M steps, create a list for each particle that contains all the particles within r_n.

in between steps, use this reduced list to find atoms inside r_c

neighbor list creation - O(N²) operation

MD: force calculation - O(N) operation

MC: energy calculation - O(1) operation

How to set up neighbor lists - MD

2 data structures

nlist one long list of neighbors
nlistpnt pointers (integer offsets) into the list

nlist = [2, 3, 4, 3, 4, 4, 7, 7]

nlistpnt = [1, 4, 6, 7, 7, 8, 9, 9]

particle 1 2 3 4 5 6 7

sample ‘neighbor’ routine

do I=1, N-1 (N = # of atoms)

do j=i+1, N

compute r² (with periodic boundary conditions)

if (r² < r_n²) then

add "j" to list of atom "i"

endif

enddo

What to add to the code

(1) Main routine :

call neighbor

call force

call thermo

(2) Integrate routine :

update v (1/2 a timestep)

update x (full step)

if (Mth step) call neighbor

call force

(3) Force and energy routines

(old)

do i=1,natoms-1

do j=i+1, natoms

compute distance between atoms i,j

(new)

do i=1,natoms-1

do k=nlistpnt(i), nlistpnt(i+1)-1

j=nlist(k)

compute distance between atoms i,j

Things to think about

How do we choose M (reneighboring frequency) and d (skin size) ?

What is "correct" reneighboring criterion ?

If you run with and without neighbor lists, will you get the same answer ?

Cautions:

do not overflow nlist

consider the case where the number of neighbors is zero !!

List Radius	Update interval	Time (N=256)	Time (N=500)

no list		3.33	10.0
2.60	6	2.24	4.93
2.70	13	2.17	4.55
2.90	26	2.28	4.51
3.10	43	2.47	4.79
3.43	83	2.89
3.50	100		5.86

Time = CPU time per step in seconds, (Thompson, 1983)

Neighbor Lists in MC

2 data structures

nlist (N,Nlist_max) 2D array of neighbors

nlist_length (N) number of neighbors for each particle

need to know all information for all the particles because we operate on particles one at a time

Changes to program:

(1) Main

call neighbor

call config (initial energy)

in main loop:

every M steps -> call neighbor

(2) Energy and Config:

(old) do j=1, N

(new) do ilist=1,nlist_length (i)

j=nlist(i,ilist)

Linked Cells

hash atoms into bins every timestep

regular grid of bins ibin_x = Nbins_x*x(i)/Lx

search only surrounding bins to find atoms within r_c

data structure : 2D array of bins, each bin containing list of atoms (ibin, atoms_bin)

Things to think about

What does the bin size need to be ?

How many bins to be searched in 2D ? 3D?

What is the scaling ?

More or less efficient than neighbor lists ?

Linked Cells and neighbor lists

hash atoms into bins every timestep

regular grid of bins ibin_x = Nbins_x*x(i)/Lx

use surrounding bins to set up a neighbor list

use neighbor list to find atoms within r_c

# of ij pair distance calculations

verlet neighbor list : n_v=4prr_n³/3

linked cells : n_l=27rr_c³

n_l/n_v = 6.4(r_c/r_n)³

combined method eliminates N² search for neighbors

Statistical Errors

Two problems

averages over finite time
successive configurations may be highly correlated

Want to find the RMS deviation in some average property <A>

For a simulation of infinite length ...

For a simulation of finite length, t_B with correlated samples characterized by a correlation length of

Need to establish ...in the limit t_B gets large

Strategy : Break a simulation up into blocks and take block averages

For a block of length t_B, where we can calculate

Plot vs t_B, and find the long time limit

MC program

T*=0.7, rs³=0.8, r_c=2.5

uncorrelated configurations occur ever 2650 steps !!

Lab 2 : Computational Efficiency and Error Estimation.

A new MC code will be provided which has neighbor lists already implmemented. This code also has the basic coding for block averages for estimating errors in the potential energy.

to implement neighbor lists in your molecular dynamics program.
to study the scaling behavior of the codes with and without neighbor lists
to implement block averages and error estimation in the MC code for the pressure
to study the dependence of the errors on the state point of interest and the number of MC moves.

1. First let’s see what we can expect from the neighbor lists. In the new MC code provided, set up the following parameters:

use: n_moves=5000 n_eq=1000 n_subavg= 100

nac = 1 ncx=ncy=20 xa(1)=ya(1) = 0.5

pshift = 0.1

rs² = 0.8 kT_initial/e =1.0 r_c /s= 2.5

A) tabulate the time in the energy routine and the total run time as a function of the parameter Nneigh (this is the number of moves between updates of the neighbor list). Compare your results with the old program where neighbor lists were not used.

What value of Nneigh gives the minimum run time ?

What is happening at high and low values of Nneigh to get the minimum you observe?

B) Using the value of Nneigh at the minimum run time in part A, compute the ratio Nneigh/N. Use this ratio to determine Nneigh, and tabulate the time in the energy routine and the total run time as a function of the number of particles, N. Include in your table the run times for the old program without the neighbor lists.

In the old and new programs, what is the expected scaling behavior of the energy routine, and the total run time as N increases ?

Do you observe these trends? At what values of N ?

If the limiting scaling is taken to be time = A N^b, what are the constants A and b for the scaling of energy routines and total run time for the old and new programs ?

2. Implement neighbor lists in the molecular dynamics program. This will entail 1) writing the subroutine ‘neighbor’ discussed in class, and modifying the loops in the force, energy, and pressure routines. All the other necessary code will be provided.

3. The MC code provided calculates the statistical inefficiency as a function of the length of block averages in the program.

Test the sensitivity of the standard deviation of the energy (i.e. the error in the average), and the statistical inefficiency (the number of steps between uncorrelated moves) to the number of MC moves and the state point.

start with

use: n_moves=1,100,000 n_eq=100,000 n_subavg= 10000

nac = 1 ncx=ncy=10 xa(1)=ya(1) = 0.5

pshift = 0.1

rs² = 0.8 kT_initial/e =0.7 r_c /s= 2.5

and vary the density, temperature, number of moves, and equilibration

A few more fun things to do ...

1. implement linked cells and/or the hybrid linked cell/neighbor list method.

2. implement error calculation in the MD code.

3. implement error calculation for pressure.

4. investigate ‘equilibrium’ with both MC and MD codes.