Wednesday, July 8, 2015

New data!

Recently the LHC has started Run 2 and this week there are high intensity beams delivering a lot of data. This is a very exciting time to be a physicist and what I have been waiting for the past two years. In fact, my contract as been extended to make the best use of this new data. No amount of working with simulation or theory can compare to working with real data. This job has a lot of advantages and disadvantages, and right now the job is extremely motivating and enjoyable.

What the first 2015 data looked like at CMS.

Something I've wanted to do with new data is automatically analyse them as they arrive, and now I can do that for the first time. (I had tried when I worked on ATLAS, but due to time and technology constraints it was not possible.) Now that we have a globally distributed computing system where any computer can access any file at any time I can stream the latest data each night and run the analysis before I even wake up. So that's what I'm doing right now and it's one of the most fun projects I've ever developed! I've spent the past couple of years preparing and tweaking the software so that I'd have a push-button system in place by now, and it's worked very well so far. There are going to be many teething problems, of course, but they're minor compared to the labour that will be saved. My first large test job is currently running, analysing at its peak 300 events per second (100 events per second, once I/O is taken into account.) Following on from a previous post about how my personal computing resources are not sufficient to do all the work I need to do, I've contacted IT support and been given a generous allowance of disk space, as well being able to run my CPU and I/O intensive jobs overnight. This is going to make a huge difference, and has allowed me to support my students with the simulations and datasets that I analyse. Seeing all this come together is a wonderful experience.

This is how excited physicists get when we first see beams in the machine! I was a very proud Shift Leader that day.

On the one hand this is, of course, very rewarding, but on the other hand a little disappointing that this isn't already being done. There seems to be a culture in particle physics that real work is about manually submitting jobs and analysing data, and there are politics associated with the notion that people should take turns in carrying the burden. We have the technology and expertise to run an analysis automatically, so why not do that? That's what we would see in the private sector. We have access to some of the largest datasets the world has ever seen and some of the most powerful computing resources ever created, it would be a crime not to exploit those as much as possible. I get the impression that I'm the only person doing this kind of work, and if that's the case then it's certainly time to move on to a different workplace. But first, I have data to analyse! I love data.

Wednesday, May 13, 2015

A perverse design choice

Today I'm spending my time working around a problem that shouldn't exist in the first place. I work on the CMS experiment, using the CMSSW framework, which is usually rather well organised. However it suffers from a problem that is not seen in most other industries, which is that its developers continually and knowingly breaks backwards compatability. This is a major problem because different datasets and simulation campaigns get tied to particular releases, which means that if you develop your own software, as I have done, and you want to use it with different datasets or simulation campaigns, as I do, then you have to take these differences into account.

CMSSW uses C++, so there is no getting around the problem that the method signatures have to match exactly. As an example I have the following lines of code in CMSSW_5_3_11, which is used for the 8 TeV data:

  beamSpotLabel_ = iConfig.getParameter("beamSpot") ;

For later CMSSW releases the code looks like this:

  beamSpotLabel_ = consumes(iConfig.getParameter("beamSpot")) ;

As you can see the implementation has changed in such way that if I want to use the beamspot in CMSSW_5_3_11 and later versions I need to carry around two sets of code. Unfortunately this isn't the end of the story because I have five different CMSSW releases in parallel, with different changes in implementation between each one, so I need to keep track of five different sets of tweaks just to get things working. For a while I had five different branches and instructions on which ones people should use, but this meant making five pull requests and five merges each time, because merging to a master branch would overwrite all these small tweaks when applying it to the separate branches. As a result I would end up with five branches diverging by more and more each time someone makes a pull request, and would have to maintain them manually, and all the while GitHub would be complaining that things aren't up to date.

The way I got around this was to use C++ comments and use a script to change the CMSSW release. I have to admit that I find this solution to be quite clever. The above code would be replaced with something like this:

// CHOOSE_RELEASE_START CMSSW_7_0_6_patch1 CMSSW_7_3_0 CMSSW_7_2_0 CMSSW_6_2_5 CMSSW_6_2_0_SLHC23_patch1
  beamSpotLabel_ = consumes(iConfig.getParameter("beamSpot")) ;
// CHOOSE_RELEASE_END CMSSW_7_0_6_patch1 CMSSW_7_3_0 CMSSW_7_2_0 CMSSW_6_2_5 CMSSW_6_2_0_SLHC23_patch1  
/* CHOOSE_RELEASE_START CMSSW_5_3_11
  beamSpotLabel_ = iConfig.getParameter("beamSpot") ;
CHOOSE_RELEASE_END CMSSW_5_3_11   */

Then all I need is a python script to go though and comment/uncomment the relevant parts to match the release. This way I can keep one single branch with all the pull requests and keep everything up to date with minimal fuss. When I told a friend that this is what I was doing she was shocked. What sort of organisation would break backwards comptability so regularly, especially if it means that code that works at 13 TeV won't work at 8 TeV? It turns out the answer is the CMS experiment would do that, and this makes code development time consuming, tedious, and potentially dangerous. I don't want to be around in a few years time when most people can't remember how to use CMSSW_5_3_11 for the 8 TeV data.

I thought things were under control, except now there is a new problem. Suppose a developer adds a new feature and submits a pull request- which CMSSW release are they using? Is it important to specify which CMSSW release they use? So now I have to write yet another script that goes through all five releases, create environments for each one in turn, copy across the source code and attempt to build it. Every time there's a new feature, no matter how minor, it has to be tested with all these different CMSSW releases. It also has to be tested in a "safe" space where other CMSSW relases can't interfere with it. Suppose I am developing in CMSSW_7_3_0, which is one of the most recent releases. I create a new release area, check out the code, do my development and test it. Now I need to test it on the other four releases, and do so somewhere outside the current release area. That means going up through the directory structure, making five new release areas, copying the source code, setting up the environment, doing build clean, then build, and fixing errors as they arise. This is not at all pretty, and it's also a bit of an imposition on the user, since it makes temporary directories in their user space. However that's the best I can do under the circumstances and we'll see soon enough if it's sufficient. This all assumes that the user is not using a sixth CMSSW release, which is something else I need to take into account...

CMS has made a perverse design choice that is taking up a lot of my time that could be better spent on actual physics analysis. I think that's an excellent reason to move on to a more sane coding environment somewhere in the private sector.

Edit: After speaking with some colleagues (who sympathised) and friends (who gave some useful advice) I realised that the "correct" way to handle this situation is to use compiler directives. I have other people using the software, so for now I'll keep the current solution, but when a suitable time comes I'll consider using compiler directives to manage the build. However, even that brings its own challanges.

Tuesday, April 28, 2015

A loss of life and talent

Two weeks ago one of my close friends, Moritz, died in a climbing accident. I've blogged about it a lot on my other blog, and it's had a profound effect on my attitude to work. The decision to study particle physics followed my brother's suicide, when I needed time to myself and needed to put off big decisions about my life. I've been prioritising my professional life over my personal life ever since, but now my needs have changed. I spent a long weekend in the UK with some good friends who know what it's like to grieve, and some who knew Moritz. In contrast to that, life in Brussels and at CERN seems cold and I've lost a lot of enthusiasm for physics. I've been keeping this blog for over a year at this point and for over a year I've been planning my next steps. This has been accelerated by losing Moritz. For the past year I've used my spare time to develop plans for the future. With Moritz's death my attention has suddenly been focused more on the future and find myself building up the infrastructure I need to move back to the UK with some startup companies under my belt. For the past few days I've found myself contemplating moving back as early as August, before we get the most interesting LHC data. At this moment my heart isn't in physics at the moment, and it feels like my enthusiasm for it died with Moritz. It's made me feel very strongly that CERN is in the past and even though I didn't see much of a future for me in particle physics research, at the moment I'm struggling to see how it's relevant to the present day. I don't know if this will be a temporary feeling or not, so I'm waiting before I act on it, and I might get my passion back for one last hurrah before I move on to more exciting and challenging things.


Moritz as I remember him, full of ambition and on top of the world.

Moritz was incredibly bright and ambitious. Particle physics has lost one if its most promising minds, just as his career was about to take off and go to new and exciting places. His loss to field is one more piled on top of so many promising young people who have failed to find good jobs, in spite of their brilliance and genius. The field is harsh and cruel, and although it's just bad luck that Moritz died so young it's one loss too many for me. I don't want to spend any more time in a field which seems to repeatedly ignore talent and ingenuity (which it also did to Moritz) while there are other opportunities out there just waiting to be taken.

I don't want to make rash decisions based on fear, or regret, or grief, or frustration, which is why I'm waiting before I decide when to leave. There's also no reason to think that another choice won't have all the same problems. But given the choice between living in yet another foreign country for relatively poor pay and poor working conditions, and working close to the people who actually care for me when I need them the most it's an easy choice to make. I'm done giving particle physics the benefit of the doubt and at this point only a UK based fellowship where I got to set my own agenda would keep me in the field.

(This post was written en route to Moritz's funeral.)

Playing catch-up

I haven't posted much on this blog in a while and that's mostly because I've been too busy. A couple of months ago I was working very hard in the office, the spending a lot of evenings out having beers with graduating students (something I do more for their sake and the sake of the future of the field than my own sake), and generally not having much rest. This is how I like things, I love to be active and I hate being bored. It was around this time that I caught a virus while I had to go to CERN to present a talk. Without time to rest and recover I found myself on the plane unable to use my hands as they cramped up and went very cold. I was taken from the plane in an ambulance, and around an hour later I was able to get myself to CERN and spent most of the trip sleeping 16 hours a day (while having more work piled on me). I took most of the next week off and rather than sit at home for a week I invited my mum to visit Belgium. Since then I've been playing catch-up, and it's only now that I find myself in a state where most things are finished and I can look to the next tasks.


Not wanting to waste any time, I spent my sick days showing my mother the sights.

This isn't particular to physics and I'm quite sure I'd have faced similar problems in any job I chose. I'm not very happy with how the aftermath was handled though. My boss acted as though excessive work had caused me to catch a virus, rather than this being bad luck. (I hadn't been this ill in about twenty years.) On top of this I was given some additional work which I simply didn't have time for, while at the same time was being told that the most important work wasn't urgent. Time and again my boss has told me not work so much on the ntuple maker that I developed, until I tell her that every other part of my work depends upon it and that it greatly speeds up the rest of my work. She seems to have a very poor understanding of what my work entails or what I need to complete the tasks which is incredibly frustrating at times. At the time the ntuple maker had five different releases for five different releases of the central software CMSSW. Each release corresponded to a different aspect of my other work, so maintaining the code was essential to everything else.


13 branches and 6 releases, upon which all the rest of my work depends.

These problems are made even worse by some serious gaps in the technical support. I've had roughly five tasks running in parallel that all require my time, CPU time, and disk space. While I've got plenty of CPU time at my disposal there isn't enough disk space to perform more than about two tasks at once. As a result if I work on one of my tasks I have to delete the files for another one to make space, perform my work, then regenerate or copy the files over for the first task again. All this eats into my time and ultimately makes me significantly less productive. There are some perverse design choices made by the local IT support, for example not making /tmp space available, or giving us access to large amounts of disk space without being able to use it interactively. Rather, I need to copy the files locally and work with them there, so I spend hours waiting for files to be transferred.


An extra terabyte could solve nearly all my problems.

All of this seems to come from an attitude that I shouldn't try working too hard, and that I shouldn't try to be excellent at my job, and I find that not only frustrating, but it makes it feel like I'm wasting my time here. I moved my entire life to Brussels for this job and I want to make it worth the sacrifices. Instead I find myself confronted with arbitrary limitations on what I can achieve technically, and being told to not work on the very things that make me more productive and efficient. Even when I have to take time off and play catch up the resources that I need are not available to me. With the right resources I could be at least twice as productive, and in fact one of the most frustrating aspects is that once I'd finished working my way through the backlog the actual physics just took a matter of hours to get some very interesting and useful results. This is work that my boss had been telling me to abandon, despite getting interesting results within a few hours. If I'd had an extra terabyte of disk space two months ago I'd have had those results two months ago. It's simply a waste of time working in these kinds of conditions, and if things don't improve I might just invest in my own hard drives to store the files I need.

Thursday, January 8, 2015

The public and private sectors

The last time I wrote a blog post it was on a plane from Miami to Washington DC on one of my other blogs. It was there that I wrote about the support of one of my closest friends, and about how my life as has unfolded in the past few years. At the moment I'm on the Eurostar heading from Brussels to London, so it only seems fitting that I write another blog post, this time about the future and how the support of my friends will help me shape the coming years.

The last time I was in London (for scientific outreach and collaboration on the associated software) and I took the opportunity to speak to some friends about the opportunities available in London. They are both former physicists whom I met during my research at SLAC in California and they have taken two very different paths. Tim completed his PhD and went into financial services, working in the city, whereas Graham completely his PhD, taught in a school and now works in the British Civil Service. The two experiences appear more similar that I would have thought. In both cases their jobs rely on their ability to analyse and manage data, create models, and perform statistical analyses. They both take the skills they developed in their PhDs and turn them to immediately useful exercises, rather than engaging in what is essentially structured intellectual curiosity.


Visiting the Tate Modern with Tim and Graham

While there is something rewarding about working in science, there is also something to be said for working in something less esoteric and more tangible. I often get the feeling that I stopped doing what I do very few people would notice. Despite a rich physics program at the LHC there is a nagging doubt in the back of my mind that there simply aren't enough interesting projects to occupy all the physicists. If we stripped down the collaborations to streamlined sizes that could reasonably achieve the same goals then very few people would really notice that I wasn't there anymore. I don't think the same can be said when our actions have consequences for other people, and it would be exhilarating to once again feel that my work impacted on other people far outside my field of work. (Incidentally I've been active in many areas outside my work that have impacted on others, including my sabbatical year at the student union, creating the LGBT CERN group, and starting a one person outreach effort to try to get more people interested in the findings of the LHC, so I clearly have a need to be productive outside of academia.)

Taking a look at the constrasts of their experiences reveals some of the more frustrating aspects of their jobs. Applications for the Civil Service can take many months to a year to complete, and depending on the path taken one can end up in the wrong stream with respect to the graduate programs. Since I've already got a masters degree, a PhD, had three full time jobs, and lived abroad it would be a shame to end up on the slow stream, especially after so much time going through the application process. On the other hand from what I've heard about the financial services the process is very quick, and considering I'd be moving from one country to another, it may prove to be a little too quick. On the other hand if pay and promotions are performance based then the stream I enter wouldn't matter too much. The two approaches seem to be polar opposites of each other, with the Civil Service being slow, methodical, shrouded in history and red tape, whereas the financial sectors is rapid, innovative, and responsive to changing environments. It's almost worth applying just to get the experiences.

Of course it's also important to look at the nature of the work. The Civil Service is just that- a service for the people, and that appeals to me a great deal. I have always wanted to help those around me, and not just spontaneously, but in a structured and consistent manner. Charity is all well and good, but it's not the same as enacting policy, in fact it's often a poor substitute for it. Having had a history of student politics and formalised support structures I'm very much in favour of policy driven work, rather than sporadic handouts from those who have a social conscience. Putting political biases aside it's clear that execution of public policy can only go so far. The world also needs innovators and investors to improve our society. That's where the financial industries can really make a difference. Whether that's for better or for worse is not a simple or even meaningful distinction as far as i can see. There are those who rail against the financial industry, and mock those who take part as having sold their souls. To be honest I can't much of a difference between academia and finance. The world would be much worse off without either, and there are those who exploit both for their own personal gain. It's true that when bankers play with people's money it can affect the economy in a more direct way, but the particle accelerators aren't cheap, and we burn through tens of millions of pounds each year, with only a few dozen papers to show for it. It's more politically correct to defend academia, but that doesn't mean that writing papers is morally superior to moving money around to where it's needed and skimming a little profit off the top at the same time. On the subject of ethics it's also important to consider the moral implications of working for the Civil Service. Would I want to enact the policies of a Conservative government? If not then how can I in good conscience claim to be serving the public? I can't pick and choose which policies I want to enact, so the morality of working for the Civil Service is a lot more nuanced than it at first appears. Given the choice I'd rather plan for more hospitals and fewer cars, but those decisions wouldn't be up to me.


I would love to get my hands on all that lovely data.

Day to day the jobs seems fairly similar in some respects. They work with computers to make predictions and analyse what they can. However the Civil Service has more firm working hours and more reasonable workloads, whereas in finance there are often late nights and hard deadlines, which is much closer to how I work now. To be honest if that was the only difference I'd go for a job in the financial sector. I find the deadlines exciting and they improve my productivity. I recently found myself working to deadlines for some recent classes I had to teach and it improved my output significantly. In the Civil Service it looks as though the software is quite outdated and limiting, whereas in the financial sector I'd be expected to be fluent in C and be able to write my own software. Given those choices my preference is again clear. I'd take the freedom of a C programm over constraining spreadsheets any day. There is joy in constructing something new from a handful of raw materials, even if those raw materials are for loops and xml files.

So would I pursue either kind of job? Perhaps. They are both appealing in their own ways, but this is not an exclusive list of opportunities. I'm still considering a start up opportunity, which requires a lot of "free" time for development. I'm certainly going to keep my options open for now and compare the different opportunities to each other. Making these kinds of comparisons clarifies my thinking and helps to narrow down my interests. The Civil Service would be a "safe" option, and a rewarding one, but I'm not sure that's what I want at this stage in my life. On the other hand working in finance might just repeat some of the more stressful aspects of my current job at the same time as lacking the intellectual stimulation that I crave so much. No job is perfect, although as far as the actual work goes in my current job I'd have a hard time finding something more rewarding. The parts of my life I'd hope to improve by changing field are related to the salary, geography, social and love lives, and medium to long term stability (where staying in one country for more than five years is considered "stable"). Settling down in London or Manchester for a while would certainly address most of, if not all of, those concerns, and I've got two very good opportunities to pursue.

Wednesday, December 17, 2014

Jet setting


Enjoying the view from the hotel.

As I write this post I'm sitting on the patio of a hotel in Miami. It's CMS Week and that means it's time to travel to a foreign country and present my work to the whole collaboration. Since mid August I've spent about half my time away from Brussels, travelling for work, and spending a large amount of time in the air. It's certainly one of the perks of the job and conference organisers go out of their way to make the experience fun. So putting everything else to one side for a minute, the opportunity for travel is something I love and certaintly one of the most fun parts of the job.

There are certainly advantages and disadvantages to this part of the lifestyle though. Spending so much time travelling obviously means spending much less time at home, so I've barely been to the gym, or even had regular shopping trips for the past four months. My social life has taken second place to shifts at CERN, workshops and CMS week, and that takes a toll after a while.


Elsewhere on my travels I got to stay on the Danube and watch the Serbian navy sail past.

However I think the biggest advantage of these trips is that it gives me time to think outside of my normal routine for a while. Physics needs a huge amount of innovation and new ideas are not cheap. Taking the time to get out of the lab, walk in a foreign climate for while with a laptop and a notepad, without meetings to attend or deadlines to meet gives me the chance to step back and find new directions for my work. It's very rare that I go for a trip and don't come back with at least three new ideas that can substantially improve my current work, or lead to something new. I love innovating, I love problem solving, and I love physics. Although I'm attending talks and collaborating, this is essentially a working holiday so I'm going to do what I love doing the most- blue sky thinking about how to solve the biggest problems I currently face at work. I wish I work like this all the time, but unfortunately the hard work needs to be done too, and when I get back to Brussels I'll have to return to meetings and documentation and submitting jobs.


Anticipating my presentation to the collaboration.

For now I can bask in the December sun, chat with the Director of the lab about the "bigger picture", decide what my priorities should be for next year, and take the time to plan it all out. It's good to be back in the USA again, and I hope to have opportunities like this in whatever job I have in the future.

Saturday, November 1, 2014

Coding opportunity: ContentMine

While on a trip to CERN I met a collaborator of a friend, and we discussed options after physics. She mentioned an interesting project called ContentMine, which aims to electronically mine data and information from academic articles. It's a concept which could change the way we respond to information and academic findings, and its website explicitly says:

"Although content mining can be done without breaking current laws, the borderline between legal and illegal is usually unclear. So we campaign for reform, and we work on the basis that anything that is legal for a human should also be legal for a machine."

As a scientist, a supporter of open access to academic discoveries, an acticist, a coder, and one who loves to push boundaries of technology, this is of great interest to me! At the moment I am at saturation when it comes to coding projects, but this is certainly a project I would like to be a part of in the future, and one to keep watching.