Checkpointing and restarting
File related to this tutorial
Checkpointing
Checkpointing involves writing every aspect of a simulation to disk, midway through the simulation, so that it can later be restarted (possibly with new parameter values). In order to write a checkpoint you have to put the following line in the simulation block in the xml parameters file
The attached files can be run as an example.
A checkpointed simulation produces the following output directory structure:
For example, if the output directory is ChasteResults
, the checkpoints will then be found in the directory $CHASTE_TEST_OUTPUT/ChasteResults_checkpoints
.
Restarting
First run the attached simulation. The following steps guide you through the process of restarting the simulation using the final checkpoint.
First, copy the checkpoint directory to a new directory (not technically required but advisable) and change directory into it.
The ResumeParameters.xml
file is the new parameters file to be given to the executable, and has to be edited before being run. Open the ResumeParameters.xml
file and change SimulationDuration
to 10ms.
If you wish to append results to the original .h5 file, you need to either move the ChasteResults
folder into your test output folder, or do
so that the original results file saved with the checkpoint can be located.
Finally, run the simulation again with ResumeParameters.xml
as the input xml file. The new results folder is ./ChasteResults
, or, from the original directory testoutput/ChasteResults_checkpoints/5ms_extended/ChasteResults
. The output files in here include the simulation results from before and after the checkpoint.
Notes:
- The timestep specifies how frequently to checkpoint, and must be a multiple of the printing timestep.
- You can optionally specify the maximum number of checkpoints to keep on disk; when this limit is reached, the oldest checkpoint will be deleted before a new checkpoint is made.
- Note that making a checkpoint does add a significant overhead at present, in particular because the mesh is written out to disk at each checkpoint. This is to ensure that each checkpoint directory contains everything needed to resume the simulation. In particular, the mesh written out will be in permuted form if it was partitioned for a parallel simulation.
- Meshes written in checkpoints use a binary form of the Triangle/Tetgen mesh format. This makes checkpoints significantly smaller but will cause portability problems if checkpoints are moved between little-endian systems (e.g. x86) and big-endian systems (e.g. PowerPC).
- Checkpoints may be resumed on any number of processes — you are not restricted to the number on which it was saved. However, the mesh will not be re-partitioned if loaded on a different number of processes, so the parallel efficiency of the simulation may be significantly reduced in this case.
- Resuming a checkpoint will attempt to extend the original results h5 file, if present, so that the file contains the complete simulation results. If this file does not exist, a new file will be created to contain just the results from the resume point.
- Various parameters can be altered when resuming from a checkpoint, by specifying new settings in the
ResumeParameters.xml
file. The file source:trunk/heart/test/data/xml/ChasteParametersResumeSimulationFullFormat.xml (also shipped with the standalone executable) explains what can be changed. - Note that the
progress_status.txt
file only reports the percentage of time to go to the next checkpoint, not the end of the simulation, so will not be especially useful for watching progress in a simulation with checkpoints. However, the existence of checkpoint output folders (1ms
,2ms
, etc) provide this information instead.
Using checkpoints from previous Chaste releases
We endeavour to ensure that new versions of Chaste will always be able to load checkpoint files created by the previous release. Sometimes, however, some manual conversion of the checkpoint will be required. This is the case in order to upgrade from release 2.0 to 2.1. A script archive_convert.sed
is provided in both the source and executable releases of Chaste to assist with this process. It must be run as follows: