Cardiac Checkpointing And Restarting
This tutorial is automatically generated from TestCardiacCheckpointingAndRestartingTutorial.hpp at revision 1dfba06b4d82. Note that the code is given in full at the bottom of the page.
Checkpointing and restarting cardiac simulations
In this tutorial we show how to save and reload cardiac simulations.
CardiacSimulationArchiver
is the main class that takes care of checkpointing
cardiac simulations.
First, the checkpointing test.
We set up exactly the same simulation as in the Another Bidomain Simulation tutorial.
To save the entire simulation, use the CardiacSimulationArchiver
class, as shown in the following.
Note the BidomainProblem<2>
as the template parameter. The output directory is relative to
CHASTE_TEST_OUTPUT.
This is how to restart the test.
To restart from the saved simulation directory we use the CardiacSimulationArchiver
class, as shown in the following.
Note the BidomainProblem<2>
as the template parameter again. The dimension (2) must match the one given in the
saved archive directory.
The output directory is again relative to CHASTE_TEST_OUTPUT.
The simulation duration has to be amended.
Note that the duration is always given with respect to the origin of the first solve.
This means that we are running from t=5 ms
(the end of the previous simulation) to t=10 ms
.
The output files are concatenated so that they appear to be made by a single simulation running from
t=0 ms
to t=10 ms
.
Note: loading an archive also loads HeartConfig
options, so HeartConfig
calls such as this one must appear
after CardiacSimulationArchiver::Load()
.
One point of checkpointing and restarting is that there may be something which we want to change during the course of experiment. Here we change the conductivity.
Note that the pointer p_bidomain_problem exists in the scope of this test and that the object
which was unarchived was created on the CardiacSimulationArchiver::Load()
line above. We are therefore
responsible for deleting the memory.
Notes
Making a checkpoint does add a significant overhead at present, in particular because the mesh is written out to disk at each checkpoint. This is to ensure that each checkpoint directory contains everything needed to resume the simulation. The mesh written out will be in permuted form if it was partitioned for a parallel simulation.
To make this process slightly more efficient, Chaste will copy the original mesh files if the mesh was loaded from disk and hasn’t been modified (e.g. by permuting). Because of this, if you modify the mesh in memory, e.g. by setting element attributes as in the bidomain-with-bath tutorial, then you need to inform Chaste by calling
mesh.SetMeshHasChangedSinceLoading()
, so your modifications aren’t lost.Meshes written in checkpoints use a binary form of the Triangle/Tetgen mesh format. This makes checkpoints significantly smaller but will cause portability problems if checkpoints are moved between little-endian systems (e.g. x86) and big-endian systems (e.g. PowerPC).
Checkpoints may be resumed on any number of processes — you are not restricted to the number on which it was saved. However, the mesh will not be re-partitioned if loaded on a different number of processes, so the parallel efficiency of the simulation may be significantly reduced in this case.
Resuming a checkpoint will attempt to extend the original results HDF5 file if it exists (specified by
HeartConfig::SetOutputDirectory
andHeartConfig::SetOutputFilenamePrefix
), so that the file contains the complete simulation results. If this file does not exist a new file will be created containing just the results from the resume time.When checkpointing, the
progress_status.txt
file only reports the percentage of time to go until the ‘’next’’ checkpoint, not until the end of the simulation. This makes it slightly less useful; however, the presence of the checkpoint directories (1ms
,2ms
, etc.) provides overall progress information instead.On all Boost versions if you are saving and loading a simulation in different source or test files, ensure that you have the same list of includes in both cases, or the serialization code may not know about all classes when loading, and give an
unregistered class
error.