Wednesday, May 13, 2015

A perverse design choice

Today I'm spending my time working around a problem that shouldn't exist in the first place. I work on the CMS experiment, using the CMSSW framework, which is usually rather well organised. However it suffers from a problem that is not seen in most other industries, which is that its developers continually and knowingly breaks backwards compatability. This is a major problem because different datasets and simulation campaigns get tied to particular releases, which means that if you develop your own software, as I have done, and you want to use it with different datasets or simulation campaigns, as I do, then you have to take these differences into account.

CMSSW uses C++, so there is no getting around the problem that the method signatures have to match exactly. As an example I have the following lines of code in CMSSW_5_3_11, which is used for the 8 TeV data:

  beamSpotLabel_ = iConfig.getParameter("beamSpot") ;

For later CMSSW releases the code looks like this:

  beamSpotLabel_ = consumes(iConfig.getParameter("beamSpot")) ;

As you can see the implementation has changed in such way that if I want to use the beamspot in CMSSW_5_3_11 and later versions I need to carry around two sets of code. Unfortunately this isn't the end of the story because I have five different CMSSW releases in parallel, with different changes in implementation between each one, so I need to keep track of five different sets of tweaks just to get things working. For a while I had five different branches and instructions on which ones people should use, but this meant making five pull requests and five merges each time, because merging to a master branch would overwrite all these small tweaks when applying it to the separate branches. As a result I would end up with five branches diverging by more and more each time someone makes a pull request, and would have to maintain them manually, and all the while GitHub would be complaining that things aren't up to date.

The way I got around this was to use C++ comments and use a script to change the CMSSW release. I have to admit that I find this solution to be quite clever. The above code would be replaced with something like this:

// CHOOSE_RELEASE_START CMSSW_7_0_6_patch1 CMSSW_7_3_0 CMSSW_7_2_0 CMSSW_6_2_5 CMSSW_6_2_0_SLHC23_patch1
  beamSpotLabel_ = consumes(iConfig.getParameter("beamSpot")) ;
// CHOOSE_RELEASE_END CMSSW_7_0_6_patch1 CMSSW_7_3_0 CMSSW_7_2_0 CMSSW_6_2_5 CMSSW_6_2_0_SLHC23_patch1  
/* CHOOSE_RELEASE_START CMSSW_5_3_11
  beamSpotLabel_ = iConfig.getParameter("beamSpot") ;
CHOOSE_RELEASE_END CMSSW_5_3_11   */

Then all I need is a python script to go though and comment/uncomment the relevant parts to match the release. This way I can keep one single branch with all the pull requests and keep everything up to date with minimal fuss. When I told a friend that this is what I was doing she was shocked. What sort of organisation would break backwards comptability so regularly, especially if it means that code that works at 13 TeV won't work at 8 TeV? It turns out the answer is the CMS experiment would do that, and this makes code development time consuming, tedious, and potentially dangerous. I don't want to be around in a few years time when most people can't remember how to use CMSSW_5_3_11 for the 8 TeV data.

I thought things were under control, except now there is a new problem. Suppose a developer adds a new feature and submits a pull request- which CMSSW release are they using? Is it important to specify which CMSSW release they use? So now I have to write yet another script that goes through all five releases, create environments for each one in turn, copy across the source code and attempt to build it. Every time there's a new feature, no matter how minor, it has to be tested with all these different CMSSW releases. It also has to be tested in a "safe" space where other CMSSW relases can't interfere with it. Suppose I am developing in CMSSW_7_3_0, which is one of the most recent releases. I create a new release area, check out the code, do my development and test it. Now I need to test it on the other four releases, and do so somewhere outside the current release area. That means going up through the directory structure, making five new release areas, copying the source code, setting up the environment, doing build clean, then build, and fixing errors as they arise. This is not at all pretty, and it's also a bit of an imposition on the user, since it makes temporary directories in their user space. However that's the best I can do under the circumstances and we'll see soon enough if it's sufficient. This all assumes that the user is not using a sixth CMSSW release, which is something else I need to take into account...

CMS has made a perverse design choice that is taking up a lot of my time that could be better spent on actual physics analysis. I think that's an excellent reason to move on to a more sane coding environment somewhere in the private sector.

Edit: After speaking with some colleagues (who sympathised) and friends (who gave some useful advice) I realised that the "correct" way to handle this situation is to use compiler directives. I have other people using the software, so for now I'll keep the current solution, but when a suitable time comes I'll consider using compiler directives to manage the build. However, even that brings its own challanges.