Skip to main content

People not liking open source (and it's not Oracle)

Posted by fabriziogiudici on November 7, 2010 at 2:59 AM PST

I've already dealt with this argument so far... but it's really so crazy that I can't prevent myself from blogging again on it, also taking advantage of this article by ACM titled "Should code be Released". The subcaption says it all "Software code can provide important insights into the results of research, but it's up to individual scientists whether their code is released - any many opt not to.".

So, some scientists still refuse to publish the code that helped them in achieving a certain theory. While I'm certainly not so naive to assert that they should publish to SourceForge since their first commit, once one has published his research to a couple of relevant places, and has bound his name to that research, arguing that releasing the code could help others to "steal" is really hilarious. On the contrary, we all know how big a difference in quality the open source approach can deliver. Science is based on peer review: how the hell can be that a theory is peer reviewed if you can't reproduce the steps to get to the underlying model? While in our community we are only poor technologists and not scientists, everybody would scream in disgust if I only dared to assert "I have demonstrated that Java is 5x faster than C", but I don't release the benchmark code so everybody can try it.

 I can only conclude that many scientists are not confident at all with their theories, or they are purportedly cheating.

Related Topics >>

Comments

People not liking open source

Hi Fabrizio,

I have worked enough with scientists to understand that all of this is... somewhat irrelevant.

In my experience: the quality of code that is written by scientists is notoriously low on average. While infrastructure code in large scientific collaboration may have some test coverage, indiviual analysis or smaller project do not. Using version control systems is the exception, rather than the norm. I have seen people having to rewrite an upgraded version of an analysis from scratch, because the previous copy was lost. I know of people that don't share their code because they know it's awful (which is much better then the opposite!). It's a mess, and you may wonder (as a software engineer... and I did): how the heck can they have any certainty that what comes out of the code makes any sense?!?

It turns out: software quality is just one of many problems. Whenever you have a detector, you will have lots of other things associated by that (hardware glitches, background signals, calibration, ...). Whenever you collect data you have to understand a lot of things about your data (provenance, the conditions on which the data was taken, what methodology, ...). Software adds one element to this mess. So, what does a scientist do? A good one, at least?  You look for patterns that you know that must be there because of previous knowledge. And fix the chain. Until all expected result are there, you don't trust your detector/code/methodology/data. Software bugs is just a source of those discrepancies. Take the LHC, for example: the first few years they will use the conclusions developed by other accelerators previously (like RHIC here at Brookhaven) to "test" that their whole chain works. If they don't do that, if they can't re-look at previous results and get to the same conclusions, other scientists will be skeptical on any of the new conclusions.

Scientists are not engineers: their magnet design could be better, their electronics readout could be better, their software analysis framework could be better, (mathematicians would tell you that their math could be better)... But the quality of each individual piece is just in so far that you get science out of it. To be efficient, it's going to be the minimum amount of quality needed to get the job done (i.e. publish your paper).