Installing Chaste on Archer

General

Important: New users are encouraged to skim the Archer documentation first, in particular the getting started guide. Be aware that you have a /home and /work partition, and the differences between them.

In this document it is assumed that the dependencies will live in /work/.../chaste-libs and the Chaste code will live in /work/.../Chaste, where in both cases the "..." are something like "e10/e10/louiecn".

I/O

Serious consideration should be given to input/output (I/O) on a system like Archer. Curious readers are advised to look at these slides, and search around for Lustre tips such as this page.

Lustre striping

/work is a Lustre filesystem, where files can be distributed and broken up ("striped") over a large number of hard disks ("OSTs") to improve parallel performance, but it's somewhat up to the user to make sure things are working at their best. You basically have control over two parameters for every file and directory you own: the number of stripes, and the size of these stripes.

For parallel access, slide 24 in the above contains a good rule of thumb:

If #files > # OSTs Set stripe_count=1 You will reduce the lustre contention and OST file locking this way and gain performance
#files==1 Set stripe_count=#OSTs Assuming you have more than 1 I/O client
#files<#OSTs Select stripe_count so that you use all OSTs Example : You have 8 OSTs and write 4 files at the same time, then select stripe_count=2

(There are 56 OSTs on Archer.)

Other good rules of thumb are:

Use a stripe count of 1 for directories with many small files.
Increase the stripe_count for parallel writes to the same file - approximately 1 stripe per GB file size.
Set stripe count to a factor of the number of parallel processes

Suggestions for Chaste

With these in mind, a good way to get started is to do the following from your /work/.../ directory to set 1 stripe of 1 MB

louiecn@eslogin005:/work/e10/e10/louiecn> lfs setstripe --stripe-size 1M --stripe-count 1 --stripe-index -1 .

noting the "." at the end. These settings will be inherited by every new directory and file, and will make sense most of the time. Note that we're also setting the stripe-index here just in case - it should ALWAYS be -1 as this allows the system to load balance. Note also that the stripe-size makes no difference if stripe-count=1, it's just a good default.

So these settings work well for small files (stripe-count = 1) and directories with more files than OSTs (including our postprocessing directories in most cases). The only 2 cases I can think of where they aren't great are for 1. large input files like meshes, and 2. the large output HDF5 file. In both cases, according to the above we should use all the OSTs (or a nice factor of them) for reading/writing to a single file.

For large input files, the easiest solution is to change the directory settings before copying the data over (using scp or whatever), then change them back after. E.g.

louiecn@eslogin005:/work/e10/e10/louiecn/Chaste/projects/louiecn/test/data> lfs setstripe -S 1M -c -1 .

uses the same 1 MB chunks, but stripes any new files over every OST (-1 means use all). Then, copy your data over, and lfs getstripe ... to check the stripe_count.

The HDF5 file is trickier as it gets made by the program, so my solution involves doing three things to your code:

uncomment the H5Pset_alignment call to ensures each HDF5 chunk fits neatly into a Lustre stripe, reducing contention.
uncomment the Lustre-specific commands in Hdf5DataWriter.cpp. These set the output directory to use all OSTs just before creating the H5 file, so that it inherits the striping, then set the directory back to the defaults, so that other new files/directories get the right settings.
change the chunk size parameter `target_size_in_bytes` to 1 M (i.e. 1024*1024).

By doing these things, the HDF5 chunks will be 1 MB, aligned to 1 MB blocks, and striped in 1 MB stripes. Perfect!

If you have data with "bad" stripe settings

If you've already got data on the system and it's got sub-optimal settings (check with lfs getstripe ...), use the following template:

mv dir old-dir
mkdir dir
lfs setstripe -i -1 -s 1M -c 1 dir
cp -a old-dir/* dir/

This moves the old directory somewhere safe, creates a new directory, sets the stripe properties, and copies the backed-up contents to the new directory, where it inherits the striping. You can then delete old-dir.

Dependencies and environment variables

Adapt the following and put it at the end of your ~/.bashrc file to set things up automatically every time you log in:

export WORK=/work/e10/e10/louiecn
export PBS_O_WORKDIR=/work/e10/e10/louiecn/pbs_tests
alias cdchaste='cd /work/e10/e10/louiecn/Chaste'

# Only currently working with GNU. Intel-compiled modules are in the works.
module swap PrgEnv-cray PrgEnv-gnu
module load cray-petsc cray-hdf5-parallel vtk boost xerces-c cray-tpsl svn

export CHASTE_LIBS=/work/e10/e10/louiecn/chaste-libs
export CHASTE_LOAD_ENV=1
export CHASTE_TEST_OUTPUT=/work/e10/e10/louiecn/testoutput

export PATH=$CHASTE_LIBS/bin:$PATH
export PYTHONPATH=$CHASTE_LIBS/lib/python:$PYTHONPATH

# For dynamic linking (not currently working with some modules, but usually desirable)
# export CRAYPE_LINK_TYPE=dynamic

# Convenient alias for scons, call it whatever you like, and invoke from Chaste directory like this:
# Sco global/test/TestChasteBuildInfo.hpp
function Sco {
    scons -j16 b=CrayGcc_ndebug co=1 br=1 do_inf_tests=0 $1
}

Note that this loads the default versions, which change over time and may not be the recommended versions. You can see which versions of things are installed using

module avail [modulefile]

and load them specifically, e.g. module load cray-petsc/3.4.2.0.

A useful list of libraries and their versions, and upcoming changes, can be found here.

Installation

SCONS

From $CHASTE_LIBS:

wget http://downloads.sourceforge.net/project/scons/scons/2.1.0/scons-2.1.0.tar.gz
tar zxf scons-2.1.0.tar.gz
cd scons-2.1.0
python setup.py install --prefix=$CHASTE_LIBS
cd ..
rm -rf scons-2.1.0.tar.gz scons-2.1.0

PyCml dependencies

See InstallPyCml for more explanation.

You will need to do the following to make easy_install work:

Create ~/.pydistutils.cfg with the following content (replacing with your path to chaste-libs):

[install]
install_lib = /work/.../chaste-libs/lib/python
install_scripts = /work/.../chaste-libs/bin

Then, again from $CHASTE_LIBS:

wget http://peak.telecommunity.com/dist/ez_setup.py
python ez_setup.py
easy_install "python-dateutil==1.5"
easy_install "Amara==1.2.0.2"
easy_install rdflib

(We don't need to do lxml as it's already installed.)

XSD

For XSD we get the binary. Again, from $CHASTE_LIBS:

wget http://www.codesynthesis.com/download/xsd/3.3/linux-gnu/x86_64/xsd-3.3.0-x86_64-linux-gnu.tar.bz2
tar -xjf xsd-3.3.0-x86_64-linux-gnu.tar.bz2
ln -s $CHASTE_LIBS/xsd-3.3.0-x86_64-linux-gnu/bin/xsd $CHASTE_LIBS/bin/xsd
rm -f xsd-3.3.0-x86_64-linux-gnu.tar.bz2

As documented elsewhere, there is a bug with GCC and XSD, fixed by modifying libxsd/xsd/cxx/zc-istream.txx. Simply change line 35 of zc-istream.txx to read

this->setg(

instead of

setg(

Building Chaste

From /work/.../ check out a working copy of the source code using:

svn co https://chaste.cs.ox.ac.uk/svn/chaste/trunk Chaste --username [your Chaste username]

If the profile has loaded correctly then svn and scons should be in your PATH and you can check out the code and compile with:

scons build=CrayGcc co=1 ...

You can't run anything on the login nodes, so it's important to use compile_only=1 or (co=1) to stop the test running right away. Parallel tests need to be run through the queue.

Performance improves slightly by turning off assertions using build=CrayGcc_ndebug.

Running Chaste

Read this to learn about submitting jobs.

A job script for Archer might look like this:

#!/bin/bash --login
#PBS -l select=10
#PBS -N [job name]
#PBS -A [credit quota]
#PBS -l walltime=1:23:0
#PBS -m abe
#PBS -M [your email address]

# Switch to current working directory
export PBS_O_WORKDIR=$(readlink -f $PBS_O_WORKDIR)
cd $PBS_O_WORKDIR

# Run the parallel program
aprun -n 240 -N 24 -d 1 -S 12 -j 1 /work/.../Chaste/global/build/craygcc/TestChasteBuildInfoRunner >& stdout.txt

This script asks for 10 nodes (-l), for a total of 240 processes (-n). It assigns them 24 per node (-N) and 12 per NUMA region (-S), without hyperthreading (-j) or OpenMP threading (-d). You should have set the environment variable PBS_O_WORKDIR to wherever you want the output to live in ~/.bashrc.

The script may then be added to the job queue by typing

qsub [script name]

By appending >& stdout.txt you'll get some output in the path from which qsub was invoked. Without this, you get output in a file named after the job name.

Happy supercomputing!

You can probably ignore the information below, it's been hanging around since this was a HECToR install guide, just in case it becomes useful again.

CrayPat

CrayPat is the Cray profiling suite. There is a section in the user manual above about automatic profile generation. It is not always successful.

The process is "compile, use pat_build to instrument executable, run, use pat_report to examine profiling data".

Load the CrayPat module before compiling.

If automatic profiling process doesn't work then there are alternatives

Sampling

The simplest profiling to perform is sampling

module load xt-craypat
scons build=... 
pat_build \
   notforrelease/build/craygcc_ndebug/TestChasteBenchmarksForPreDiCTRunner \
   TestChasteBenchmarksForPreDiCTRunner+pat

Above example creates an instrumented executable called "TestChasteBenchmarksForPreDiCTRunner+pat" from the original executable. Run this instrumented executable as normal and then use pat_report to analyze either the .xf (small # of processes) or directory produced.

pat_report -O profile TestChasteBenchmarksForPreDiCTRunner+pat+21007-12441sdt

This will give time spent in individual functions and the computational imbalance in those functions. It will not give information on the calltree.

Several other pat_report options including:

pat_report -T -O profile # report all functions, not just most important
pat_report -s pe=ALL # report values for all processes

see man page and CrayPat documentation for more information.

MPI profiling

CrayPat has a series of tracegroups that can be profiled, one of which is MPI.

pat_build -g mpi executable-name instrumented-exectuable-name

To get a calltree of where the MPI time is spent:

pat_report -O calltree filename.xf

Other -O options are load_balance, callers (plus others). See man page for more details.

Other profiling

CrayPat has other tracegroups (io, hdf5, lustre, system, blas, math, ...) which work in the same fashion as the mpi tracegroup.

GUI

Apprentice2 is a GUI to look at the results (packages are also available for download for use locally with profiling results, see user manual)

module load apprentice2
app2