Chaste Buildbot Setup
Buildbot is the new test system built on Cmake and git. We use buildbot version 0.8.8, and you can see the official documentation here.
Local developers can see AutomatedBuilds? for detail of more advances configuration options.
Test Results Webpage
The landing page for the Chaste test results is here. The main feature of this website is the waterfall, which shows the current status (as well as complete history) of all the test builders. Clicking on the "waterfall help" link on the top right hand side of the waterfall will take you to a useful help page for the waterfall, which will show you how to customise what the waterfall shows you. As an example, below are a few links that customise the waterfall in various ways:
- Show failing tests
- Show Continuous tests
- Show Nightly tests
- Show Portability tests
- Show Weekly tests
- Full waterfall, refresh every 60s
Reproducing a test failure
If all the "last build" tabs are green, then relax! If not, then you might want to investigate why a particular test failed.
First click on the link in the red "last build" tab that you wish to debug, this will give you a complete history for that particular test builder. Then click on the latest build number link under the "Recent Builds" heading, this will take you to the most recent test run that failed.
This page has a lot of information which will help you to debug what went wrong with the test. Down the bottom you can see the list of changes since the last build (one of these changes might have caused the test to fail). On the right hand side of the page you will see a list of build properties. Many of these are just internal properties used in the Buildbot setup, but you should make note of these important build properties:
- compiler = the compiler used
- config = the CMake build type (Release, Debug etc)
- revision = the git revision number being tested
- slavename = the machine running the test
- workdir = the directory on the slave where the test is being run
Each builder is made up of one or more build steps. A step might be something like "configure" (i.e. run CMake) or "compile" (i.e. run make), or "test" (i.e. run ctest). See the wiki page on the CMake build system for more information on these steps. Basically each step of the test builder is something that you, as a user, might type into the command-line to perform an action. Each step of the most recent build will be colored green (succeeded), orange (warnings detected) or red (failed). You will probably want to focus on the steps that have failed (which will cause the build as a whole to fail).
In the most common case, the test step (i.e. the step that runs ctest) will be the source of the failure. In this case, click on the stdio link under this step. This takes you to the full command-line output for the test step. Right at the top of the page (in blue text) you can see the command that buildbot has run, along with all the environment variables that were set. Below this is the output of this command, which is the normal output that ctest gives you. Locate which test has failed and the failure message that is given.
This failure message, along with the build properties and changes mentioned above, should normally be enough information to allow you to checkout the correct version of the code onto your local machine, compile it in a similar fashion to the slave that ran the test and allow you to trip the same error, therefore allowing you to start debugging normally. However, if you don't get the same error as the slave, then the problem might be machine-specific (i.e. only occurs on that particular slave), in which case read on...
Machine-specific failures
Machine-specific failures often fall into two different categories. The failure might be caused due to the CMake configuration getting into an erroneous state, or it could be something wrong with how the particular slave is setup.
If you suspect the first case, the simplest solution is to delete the CMake cache file on the slave, and force a new build, which will re-configure the builder with a fresh cache.
- First, make a note of the slavename build property on the failing build, then navigate to the webpage for the builder that has failed (if you are on the build webpage, this will be the link in the heading Builder <builder name here> Build #<number> up the top of the page).
- On the RHS of the builder webpage will be a heading "Force build" (if this does not appear, then make sure you are logged in).
- Now, change the "slavename" pull down list to the slavename that ran the failing build, and ensure that the "reconfigure" checkbox is ticked. Then click the "Force Build" button.
This will kick off a new build on that slave, and will re-configure the build directory with a fresh cache. Hopefully this fixes your problem!! If not, then the problem might be the setup on the slave machine. To reproduce this, it is likely that you will need to ssh into the slave in question, so make sure you have permissions to do this.
- The machine name of the slave in question is its slavename minus the final "-slave". First ssh into this machine, and checkout the version of the code that has failed.
- Then, go back to the build webpage and make a note of all the steps that the build went through. For each step you can click on the stdout link to get the command that the slave used, and the environment variables that were set.
- Go through and reproduce all of the steps manually on the slave
This should hopefully trip the same error. Then go through your normally debugging procedure to fix the problem.