Fixing bad commits

Once code is committed to the repository, a sequence of automated builds will kick off on the BuildBot. The waterfall will sequentially run through the:

  • Compiler builders - verify that all of Chaste compiles, with GCC, Clang, and Intel
  • Continuous builders - core test suites, infrastructure tests, and parallel tests
  • Nightly builders - longer tests, utilities including memory testing and profiling, and the continuous test pack run in parallel
  • Portability builders - continuous and nightly test packs, run against multiple combinations of supported dependencies
  • Weekly builders - much longer tests, compiled with an optimised Intel build

See also ChasteGuides/BestPracticeGuide.

1. Is the "bad commit" my fault?

If any builders fail, they will go red on the Waterfall. Clicking through to a specific numbered build will give you:

  • a list of Responsible Users, containing the names of everyone who has committed code since that builder last passed;
  • a list of All Changes, showing every commit since the last passing build.

What went wrong?

For a given red builder on the Buildbot, it will be clear whether the step that failed was Configure, Compile, or Test. First off, click through to see the output from the build step. If it doesn't look like anything you did, you can always "force" a new build, and tick the "fresh build directory" box, just to be sure it's a genuine error. Then, if the error is at

  • Configure: Unless you have edited any CMake infrastructure, the configure step is unlikely to fail. If it does, try forcing a fresh build.
  • Compile: Check the trace. It should be clear if the compile error relates to what you have committed.
  • Test: Check the trace. See which tests failed, and how. If the failure appears to be in code that you have not been working on, check the other Responsible Users.

2. Nightly tests

Here we list some common causes of nightly test failures (with specific examples) and how to fix them.

Memory testing

A common cause of memory leaks is forgetting to delete a new pointer at the end of a test. We now provide an example of this type of memory leak and how to fix it. The memory testing nightly build for r13190 failed 1 out of 256 test suites. Memory leaks were found in the TestForces test suite. Part of the output for this test suite is shown below:

160 bytes in 1 blocks are definitely lost in loss record 3 of 6
   at 0x4C23809: operator new(unsigned long) (vg_replace_malloc.c:230)
   by 0x7FB662: TestForces::TestRepulsionForceMethods() (TestForces.hpp:678)
   by 0x7FCD57: TestDescription_TestForces_TestRepulsionForceMethods::runTest() (TestForcesRunner.cpp:73)
   by 0x718E39: CxxTest::RealTestDescription::run() (RealDescriptions.cpp:96)
   by 0x742FDB: CxxTest::TestRunner::runTest(CxxTest::TestDescription&) (TestRunner.h:74)
   by 0x7430B7: CxxTest::TestRunner::runSuite(CxxTest::SuiteDescription&) (TestRunner.h:61)
   by 0x7C6704: CxxTest::TestRunner::runWorld() (TestRunner.h:46)
   by 0x7C67C0: CxxTest::TestRunner::runAllTests(CxxTest::TestListener&) (TestRunner.h:23)
   by 0x7C6846: CxxTest::ErrorFormatter::run() (ErrorFormatter.h:47)
   by 0x719135: main (TestForcesRunner.cpp:19)

From this output we can see that a memory leak occurred because a new pointer was created within the test TestRepulsionForceMethods(), but was not deleted at the end of this test. This memory leak was fixed in r13191.

See also FixingMemoryTesting.

Coverage

A common cause of coverage failures is forgetting to add tests for all possible cases in an if/else or switch statement, or test that any EXCEPTIONs are thrown under the correct circumstances. We now provide an example of this type of coverage failure and how to fix it. The coverage nightly build for r12897 failed 1 out of 604 test suites. A coverage failure was found in the file AbstractFeCableObjectAssembler.hpp. Part of the output for this file is shown below:

        1:  389:    if (mAssembleMatrix)
        -:  390:    {
        1:  391:        assemble_event = HeartEventHandler::ASSEMBLE_SYSTEM;
        -:  392:    }
        -:  393:    else
        -:  394:    {
    #####:  395:        assemble_event = HeartEventHandler::ASSEMBLE_RHS;
        -:  396:    }
        -:  397:
        1:  398:    if (mAssembleMatrix && mMatrixToAssemble==NULL)
        -:  399:    {
    #####:  400:        EXCEPTION("Matrix to be assembled has not been set");
        -:  401:    }
        1:  402:    if (mAssembleVector && mVectorToAssemble==NULL)
        -:  403:    {
    #####:  404:        EXCEPTION("Vector to be assembled has not been set");
        -:  405:    }

From this output we can see that further tests are required to cover the lines marked #####. This coverage failure was fixed in r12914.

Doxygen coverage

A common cause of Doxygen coverage failures is forgetting to correctly document input arguments for a method. We now provide an example of this type of Doxygen coverage failure and how to fix it. The Doxygen coverage nightly build for r13261 failed 1 out of 722 test suites. A Doxygen coverage failure was found in the file SolidMechanicsProblemDefinition.hpp. Part of the output for this file is shown below:

/home/bob/eclipse/workspace/trunk-13261-2011-07-26-01_39_49/pde/src/problem/SolidMechanicsProblemDefinition.hpp:179: 
  Warning: argument `X' of command @param is not found in the argument list of 
  SolidMechanicsProblemDefinition< DIM >::EvaluateBodyForceFunction(c_vector< double, DIM > &rX, double t)
/home/bob/eclipse/workspace/trunk-13261-2011-07-26-01_39_49/pde/src/problem/SolidMechanicsProblemDefinition.hpp:179: 
  Warning: The following parameters of SolidMechanicsProblemDefinition::EvaluateBodyForceFunction(c_vector< double, DIM > &rX, double t) are not documented:
  parameter rX

From this output we can see that the input argument rX for the method SolidMechanicsProblemDefinition::EvaluateBodyForceFunction() was incorrectly documented as X. This Doxygen coverage failure was fixed in r13262.

Parallel

One cause for parallel build failures is forgetting to add the macro EXIT_IF_PARALLEL to tests that are only intended to be run sequentially. We now provide an example of this type of parallel build failure and how to fix it. The parallel nightly build for r13407 failed 4 out of 264 test suites. Failures were found in the test suites TestCellBasedSimulationWithBuskeForces, TestForcesNotForRelease, TestWritingPdeSolversTutorial and TestPeriodicForces. Part of the output for the test suite TestCellBasedSimulationWithBuskeForces is shown below:

 ***** TestCellBasedSimulationWithBuskeForces.hpp *****

Entering TestSimpleMonolayerWithBuskeAdhesiveForce
Entering TestSimpleMonolayerWithBuskeAdhesiveForce
TestCellBasedSimulationWithBuskeForcesRunner: cell_based/src/mesh/HoneycombMeshGenerator.cpp:41: HoneycombMeshGenerator::HoneycombMeshGenerator(unsigned int, unsigned int, unsigned int, double): Assertion `PetscTools::IsSequential()' failed.
TestCellBasedSimulationWithBuskeForcesRunner: cell_based/src/mesh/HoneycombMeshGenerator.cpp:41: HoneycombMeshGenerator::HoneycombMeshGenerator(unsigned int, unsigned int, unsigned int, double): Assertion `PetscTools::IsSequential()' failed.
TestCellBasedSimulationWithBuskeForcesRunner: cell_based/src/mesh/HoneycombMeshGenerator.cpp:41: HoneycombMeshGenerator::HoneycombMeshGenerator(unsigned int, unsigned int, unsigned int, double): Assertion `PetscTools::IsSequential()' failed.
/home/bob/mpi/bin/mpirun.ch_shmem: line 91: 12384 Aborted                 /home/bob/eclipse/workspace/trunk-13407-2011-08-10-04_06_44/notforrelease_cell_based/build/debug_fpe/simulation/TestCellBasedSimulationWithBuskeForcesRunner

From this output we can see that the class HoneycombMeshGenerator is not intended to be used in parallel, and the EXIT_IF_PARALLEL macro (defined in the header file PetscTools.hpp) should be added to the start of TestSimpleMonolayerWithBuskeAdhesiveForce. This parallel build failure was fixed in r13416.

Profiling

When running a profiling build (to identify areas for potential optimisation) the compiler performs extra code analysis, and hence can spot further potential errors in your code. These typically consist of variables that it thinks might sometimes be used before being assigned to, for example in this case:

heart/src/odes/AbstractRushLarsenCardiacCell.cpp: In member function 'virtual OdeSolution AbstractRushLarsenCardiacCell::Compute(double, double, double)':
heart/src/odes/AbstractRushLarsenCardiacCell.cpp:81: warning: 'curr_time' may be used uninitialized in this function
scons: *** [heart/build/profile_ndebug/src/odes/AbstractRushLarsenCardiacCell.o] Error 1

These can be fixed by altering the code logic, most commonly by initialising the variable to a dummy value when it is declared.

Another particularly common case is when a variable is only used within an assert. Since the profile builds define NDEBUG, assertions are turned off, and hence such variables will trigger an 'unused variable' error. For example here:

notforrelease_cell_based/src/population/mechanics/GeneralisedPeriodicLinearSpringForce.cpp: In member function 'boost::numeric::ublas::c_vector<double, DIM> GeneralisedPeriodicLinearSpringForce<DIM>::CalculateForceBetweenNodes(unsigned int, unsigned int, AbstractCellPopulation<U>&) [with unsigned int DIM = 1u]':
notforrelease_cell_based/src/population/mechanics/GeneralisedPeriodicLinearSpringForce.cpp:192:   instantiated from here
notforrelease_cell_based/src/population/mechanics/GeneralisedPeriodicLinearSpringForce.cpp:110: warning: unused variable 'ageA'
notforrelease_cell_based/src/population/mechanics/GeneralisedPeriodicLinearSpringForce.cpp:111: warning: unused variable 'ageB'

Either don't store the value tested in a variable, or assign it to itself (e.g. ageA=ageA;) to circumvent the error.

Windows

Windows builds are run using CMake on a scratch.cs but it's best for you to run specific tests in MS Visual Studio.

  • Rdesktop onto scratch.cs. (Caution here: this will kick off anyone who has that session).
  • Start Visual Studio (pinned to taskbar).
  • Load the Chaste project (see "Recent projects")
  • To run a single test:
    • (Sequential) In "Solution Explorer" right click the offending test and select "Debug"
    • (Parallel) cd ~/chaste_windows/build/chaste ; mpiexec.exe /np 2 global/test/Debug/TestDebugRunner.exe
      • (If anyone knows a nice way of doing this from VS with debugger, please tell us!)
  • To switch revision cdchaste; /usr/bin/svn up

Helpful debugging hints:

  • Stick a breakpoint in line 45 of d:\chaste_windows\build\third_party_libs\downloads\boost\boost_1_53_0\libs\serialization\src\archive_exception.cpp to track down unregistered class type errors in archiving.

Copyrights

All Chaste source and test files must include the standard Chaste copyright notice. If you omit this, the special 'Copyrights' test will fail, as for example here. Fix this by adding the copyright comment to the top of your file.

Orphaned tests and duplicate file names

All tests must be listed within a test pack file. These are text files in the test folders named like "SomethingTestPack.txt", and define groups of tests to run in standard builds (see also TestingStrategy). Any tests not listed will cause the special 'OrphanedTests' test to fail. Simply add your test to a suitable test pack (e.g. 'Continuous', 'Nightly', or 'Weekly') to fix this.

Similarly, all Chaste source and test files must have a unique name, or the build system will get confused. If you've created a file with a duplicate name, the special 'DuplicateFileNames' test will fail. Rename your file to fix this.

Tests killed off

Most of the automated builds apply a run time limit to tests; the exact limit depends on the kind of build. Tests that run for too long will be killed, and a message including "timed out" will appear at the end of the output.

If the build machines are busy when the build is kicked off, this can cause tests to take much longer than normal. It is often worth rebuilding specific builders after a "time out" failure.

In addition, common culprits are continuous tests taking too long; however this may not lead to the continuous build failing. Problems could also show up in the MemoryTesting or lofty nightly builds in particular.

Fix this problem by making the test shorter! For example, run a simulation for less time (or optimise the code). However, make sure that you don't break coverage by doing so. Consider whether tests could be refactored to reduce duplication.