Fixing bad commits
Or, how to understand the automated emails telling you something's wrong, and what to do about them. Also covers how to fix nightly test failures.
See also ChasteGuides/BestPracticeGuide.
1. Is the "bad commit" my fault?
- If you see a message like "(No test results found: [...]&buildType=_broke-13527)" then it's likely that the next queued build got tired of waiting and the build of your commit was killed off for taking too long. If this happens it's worth keeping an eye on the latest continuous results to check that a later build doesn't have errors due to your commit.
Broken builds
If you see Build failed (check build log for ": ***"). then this is a broken build. A broken build (that is, broken compilation) is more serious than broken tests. It potentially means that there will be no nightly test results, and will probably interfere with the work of every other Chaste developer until it is fixed. Check the build log for strings matching ": ***"
- The continuous test pack may have failed to compile, due to problems involved with the licence server. These happen occasionally due to too many compilations happening within Oxford at the same time, and are not serious. However, it is worth checking subsequent builds, as they may hide real errors.
Error: A license for CCompL is not available now (-15,570,111). A connection to the license server could not be made. You should make sure that your license daemon process is running: both an lmgrd.intel process and an INTEL process should be running if your license limits you to a specified number of licenses in use at a time. Also, check to see if the wrong port@host or the wrong license file is being used, or if the port or hostname in the license file has changed. License file(s) used were (in this order): 1. 28518@lic1.osc.ox.ac.uk 2. /opt/intel/cce/10.0.025/licenses/*.lic 3. /opt/intel/licenses/l_vt_evalSR7B5JVN.lic 4. /home/wendy/intel/licenses 5. /Users/Shared/Library/Application Support/Intel/Licenses 6. /opt/intel/cce/10.0.025/bin/*.lic Please visit http://support.intel.com/support/performancetools/support.htm if you require technical assistance. icpc: error #10052: could not checkout FLEXlm license
- Remember that all tests are compiled (whether or not they are run in the continuous test pack). If you have changed the interface to some functionality, then you may have left a test in a profile/nightly/failing test pack out of date.
Broken tests
- If the failure appears to be in code that you have not been working on, check recent Continuous build results. If the same failure occurred in an earlier commit, then it almost certainly isn't your fault and just hasn't been fixed yet.
2. Nightly tests
Here we list some common causes of nightly test failures (with specific examples) and how to fix them.
Memory testing
A common cause of memory leaks is forgetting to delete a new pointer at the end of a test. We now provide an example of this type of memory leak and how to fix it. The memory testing nightly build for r13190 failed 1 out of 256 test suites. Memory leaks were found in the TestForces test suite. Part of the output for this test suite is shown below:
160 bytes in 1 blocks are definitely lost in loss record 3 of 6 at 0x4C23809: operator new(unsigned long) (vg_replace_malloc.c:230) by 0x7FB662: TestForces::TestRepulsionForceMethods() (TestForces.hpp:678) by 0x7FCD57: TestDescription_TestForces_TestRepulsionForceMethods::runTest() (TestForcesRunner.cpp:73) by 0x718E39: CxxTest::RealTestDescription::run() (RealDescriptions.cpp:96) by 0x742FDB: CxxTest::TestRunner::runTest(CxxTest::TestDescription&) (TestRunner.h:74) by 0x7430B7: CxxTest::TestRunner::runSuite(CxxTest::SuiteDescription&) (TestRunner.h:61) by 0x7C6704: CxxTest::TestRunner::runWorld() (TestRunner.h:46) by 0x7C67C0: CxxTest::TestRunner::runAllTests(CxxTest::TestListener&) (TestRunner.h:23) by 0x7C6846: CxxTest::ErrorFormatter::run() (ErrorFormatter.h:47) by 0x719135: main (TestForcesRunner.cpp:19)
From this output we can see that a memory leak occurred because a new pointer was created within the test TestRepulsionForceMethods(), but was not deleted at the end of this test. This memory leak was fixed in r13191.
See also FixingMemoryTesting.
Coverage
A common cause of coverage failures is forgetting to add tests for all possible cases in an if/else or switch statement, or test that any EXCEPTIONs are thrown under the correct circumstances. We now provide an example of this type of coverage failure and how to fix it. The coverage nightly build for r12897 failed 1 out of 604 test suites. A coverage failure was found in the file AbstractFeCableObjectAssembler.hpp. Part of the output for this file is shown below:
1: 389: if (mAssembleMatrix) -: 390: { 1: 391: assemble_event = HeartEventHandler::ASSEMBLE_SYSTEM; -: 392: } -: 393: else -: 394: { #####: 395: assemble_event = HeartEventHandler::ASSEMBLE_RHS; -: 396: } -: 397: 1: 398: if (mAssembleMatrix && mMatrixToAssemble==NULL) -: 399: { #####: 400: EXCEPTION("Matrix to be assembled has not been set"); -: 401: } 1: 402: if (mAssembleVector && mVectorToAssemble==NULL) -: 403: { #####: 404: EXCEPTION("Vector to be assembled has not been set"); -: 405: }
From this output we can see that further tests are required to cover the lines marked #####. This coverage failure was fixed in r12914.
Doxygen coverage
A common cause of Doxygen coverage failures is forgetting to correctly document input arguments for a method. We now provide an example of this type of Doxygen coverage failure and how to fix it. The Doxygen coverage nightly build for r13261 failed 1 out of 722 test suites. A Doxygen coverage failure was found in the file SolidMechanicsProblemDefinition.hpp. Part of the output for this file is shown below:
/home/bob/eclipse/workspace/trunk-13261-2011-07-26-01_39_49/pde/src/problem/SolidMechanicsProblemDefinition.hpp:179: Warning: argument `X' of command @param is not found in the argument list of SolidMechanicsProblemDefinition< DIM >::EvaluateBodyForceFunction(c_vector< double, DIM > &rX, double t) /home/bob/eclipse/workspace/trunk-13261-2011-07-26-01_39_49/pde/src/problem/SolidMechanicsProblemDefinition.hpp:179: Warning: The following parameters of SolidMechanicsProblemDefinition::EvaluateBodyForceFunction(c_vector< double, DIM > &rX, double t) are not documented: parameter rX
From this output we can see that the input argument rX for the method SolidMechanicsProblemDefinition::EvaluateBodyForceFunction() was incorrectly documented as X. This Doxygen coverage failure was fixed in r13262.
Parallel
One cause for parallel build failures is forgetting to add the macro EXIT_IF_PARALLEL to tests that are only intended to be run sequentially. We now provide an example of this type of parallel build failure and how to fix it. The parallel nightly build for r13407 failed 4 out of 264 test suites. Failures were found in the test suites TestCellBasedSimulationWithBuskeForces, TestForcesNotForRelease, TestWritingPdeSolversTutorial and TestPeriodicForces. Part of the output for the test suite TestCellBasedSimulationWithBuskeForces is shown below:
***** TestCellBasedSimulationWithBuskeForces.hpp ***** Entering TestSimpleMonolayerWithBuskeAdhesiveForce Entering TestSimpleMonolayerWithBuskeAdhesiveForce TestCellBasedSimulationWithBuskeForcesRunner: cell_based/src/mesh/HoneycombMeshGenerator.cpp:41: HoneycombMeshGenerator::HoneycombMeshGenerator(unsigned int, unsigned int, unsigned int, double): Assertion `PetscTools::IsSequential()' failed. TestCellBasedSimulationWithBuskeForcesRunner: cell_based/src/mesh/HoneycombMeshGenerator.cpp:41: HoneycombMeshGenerator::HoneycombMeshGenerator(unsigned int, unsigned int, unsigned int, double): Assertion `PetscTools::IsSequential()' failed. TestCellBasedSimulationWithBuskeForcesRunner: cell_based/src/mesh/HoneycombMeshGenerator.cpp:41: HoneycombMeshGenerator::HoneycombMeshGenerator(unsigned int, unsigned int, unsigned int, double): Assertion `PetscTools::IsSequential()' failed. /home/bob/mpi/bin/mpirun.ch_shmem: line 91: 12384 Aborted /home/bob/eclipse/workspace/trunk-13407-2011-08-10-04_06_44/notforrelease_cell_based/build/debug_fpe/simulation/TestCellBasedSimulationWithBuskeForcesRunner
From this output we can see that the class HoneycombMeshGenerator is not intended to be used in parallel, and the EXIT_IF_PARALLEL macro (defined in the header file PetscTools.hpp) should be added to the start of TestSimpleMonolayerWithBuskeAdhesiveForce. This parallel build failure was fixed in r13416.
Profiling
When running a profiling build (to identify areas for potential optimisation) the compiler performs extra code analysis, and hence can spot further potential errors in your code. These typically consist of variables that it thinks might sometimes be used before being assigned to, for example in this case:
heart/src/odes/AbstractRushLarsenCardiacCell.cpp: In member function 'virtual OdeSolution AbstractRushLarsenCardiacCell::Compute(double, double, double)': heart/src/odes/AbstractRushLarsenCardiacCell.cpp:81: warning: 'curr_time' may be used uninitialized in this function scons: *** [heart/build/profile_ndebug/src/odes/AbstractRushLarsenCardiacCell.o] Error 1
These can be fixed by altering the code logic, most commonly by initialising the variable to a dummy value when it is declared.
Another particularly common case is when a variable is only used within an assert. Since the profile builds define NDEBUG, assertions are turned off, and hence such variables will trigger an 'unused variable' error. For example here:
notforrelease_cell_based/src/population/mechanics/GeneralisedPeriodicLinearSpringForce.cpp: In member function 'boost::numeric::ublas::c_vector<double, DIM> GeneralisedPeriodicLinearSpringForce<DIM>::CalculateForceBetweenNodes(unsigned int, unsigned int, AbstractCellPopulation<U>&) [with unsigned int DIM = 1u]': notforrelease_cell_based/src/population/mechanics/GeneralisedPeriodicLinearSpringForce.cpp:192: instantiated from here notforrelease_cell_based/src/population/mechanics/GeneralisedPeriodicLinearSpringForce.cpp:110: warning: unused variable 'ageA' notforrelease_cell_based/src/population/mechanics/GeneralisedPeriodicLinearSpringForce.cpp:111: warning: unused variable 'ageB'
Either don't store the value tested in a variable, or assign it to itself (e.g. ageA=ageA;) to circumvent the error.
Windows
Windows builds are run using CMake on a scratch.cs but it's best for you to run specific tests in MS Visual Studio.
- Rdesktop onto scratch.cs. (Caution here: this will kick off anyone who has that session).
- Start Visual Studio (pinned to taskbar).
- Load the Chaste project (see "Recent projects")
- To run a single test:
- (Sequential) In "Solution Explorer" right click the offending test and select "Debug"
- (Parallel) cd ~/chaste_windows/build/chaste ; mpiexec.exe /np 2 global/test/Debug/TestDebugRunner.exe
- (If anyone knows a nice way of doing this from VS with debugger, please tell us!)
- To switch revision cdchaste; /usr/bin/svn up
Helpful debugging hints:
- Stick a breakpoint in line 45 of d:\chaste_windows\build\third_party_libs\downloads\boost\boost_1_53_0\libs\serialization\src\archive_exception.cpp to track down unregistered class type errors in archiving.
Copyrights
All Chaste source and test files must include the standard Chaste copyright notice. If you omit this, the special 'Copyrights' test will fail, as for example here. Fix this by adding the copyright comment to the top of your file.
Orphaned tests and duplicate file names
All tests must be listed within a test pack file. These are text files in the test folders named like "SomethingTestPack.txt", and define groups of tests to run in standard builds (see also TestingStrategy). Any tests not listed will cause the special 'OrphanedTests' test to fail. Simply add your test to a suitable test pack (e.g. 'Continuous', 'Nightly', or 'Weekly') to fix this.
Similarly, all Chaste source and test files must have a unique name, or the build system will get confused. If you've created a file with a duplicate name, the special 'DuplicateFileNames' test will fail. Rename your file to fix this.
Tests killed off
Most of the automated builds apply a run time limit to tests; the exact limit depends on the kind of build. Tests that run for too long will be killed (see e.g. this one) and a message like "Test killed due to exceeding time limit of 180 seconds" will appear in the output.
Common culprits are continuous tests taking too long; however this may not lead to the continuous build failing. Problems could also show up in the MemoryTesting or lofty nightly builds in particular.
Fix this problem by making the test shorter! For example, run a simulation for less time (or optimise the code). However, make sure that you don't break coverage by doing so. Consider whether tests could be refactored to reduce duplication.