Comment

Overnight Open Thread

342
SixDegrees11/29/2009 10:22:53 am PST

re: #322 Charles

It’s good form to keep copies even of derivative data, should questions ever arise. Nice to hear that the data can apparently still be gotten from it’s original source, but that isn’t the same as being able to produce the precise collection of data CRU derived it’s results from.

When we do a software delivery, we tag everything that went into that delivery - including third party software packages that we have never modified at all. At any time, we are able to construct exactly what was shipped to the customer in a given delivery.

Scientific data needs to be treated exactly the same way - and most often, it is. As noted earlier, every major claim of scientific fraud I’m aware of over the last several decades has ultimately been resolved by reference to the original, raw data - either in the researcher’s favor or not. In some realms - drug trials, I believe, are one example - there are regulatory requirements that the raw data be turned over and archived for a set length of time, and at least a fair amount of government-sponsored research in the US requires retention of data.

Mostly, such requirements aren’t really necessary - self-policing ensures that records are hung onto for extended periods of time, on the off-chance they need to be revisited for some reason. The discovery, just a few years ago, that the widely accepted value for the speed of sound - a fundamental measurement long considered settled - was wrong led to the examination of the original experimental data, which revealed a small, systematic error in data collection. In other instances, we have Darwin’s original field notebooks, along with his personal copies of Malthus and other texts that influenced his thinking; we have Galileo’s and da Vinci’s original notebooks; large swaths of Newton’s original papers; and on and on. The importance of data preservation isn’t exactly new. And it isn’t unusual to revisit it; Mendel’s experimental results, when analyzed using modern statistical methods, are almost certainly “too good to be true” and were probably fudged, consciously or unconsciously, to make them better fit the hypothesis Mendel had constructed. In that particular case, he turned out to be right anyway, and was borne out in spades when the physical mechanisms of genetics were elucidated. But again, the importance of keeping good records cannot be overstated.

And as I’ve also mentioned, the appearance here is what matters in the realm of public opinion, far more so than the science. The admission that even a small portion of data was destroyed, or that the original dataset published results were based on no longer exists is like a wet dream come true for the opposition. The public is going to hear one side saying, “They cooked their data, lied and destroyed the evidence!” while the other side responds with, “The data speaks for itself!” which sounds pretty weak under the circumstances.

Again, appearances matter even more in this case, where the public must be swayed and the opposition is active. The researchers should have known better than to delete anything, regardless of their ability to explain the deletion. The effect on public perception is that their case is severely weakened.