Surveillance of a black-box data process

In my work with FracFocus, the oil & gas industry’s chemical disclosure instrument, I have had to come up with custom methods to learn about the data because the documentation is meager, the data organization is poor and the maintainers of the website are not very forthcoming when I ask questions.

Even though I’ve worked with these raw data for almost two years, one aspect has remained particularly mysterious: silent changes to already published disclosures.  This occurs when the information in a published disclosure changes without any record of the change.  For example, you may look up the chemicals used in a fracking event near your home and months later, when you check again, some of the information is different than the first time you looked.  It might be changed just a bit so you don’t even notice, or one or more values might be very different, but there is no indication when or why it changed.

It is not suspicious that a company might need to update information it has published: small mistakes can creep into reports and sometimes go unnoticed until after publication happens.  But a well-designed disclosure instrument should have some audit trail for changes made to published information. Indeed, I was told by the FracFocus team that any changes to published disclosures must be made in a NEW disclosure.  In some cases, that happens in FracFocus (though, incidentally, I was also told that these new disclosures are sometimes used to just add to previous data, not to replace it, further undermining their usefulness).

But when I was evaluating an older archive of FracFocus data from the organization SkyTruth, I came across an odd situation: the data in the old archive was mostly a very accurate representation of currently published disclosures, except that some values were very different. Clearly, the data had been changed since the old archive was created.  However, there was no audit trail, no new disclosure, no sign that the data had been changed.  I was only able to compare a few metadata values (the volume of water used and the geo-coordinates); the chemical data for those events are not available in the current publications.

One could just assume that the changes just represent the companies fixing mistakes to make the newer data more correct.  However, that might be naive: just recently I came across a report from 2014 that claims that there were several silent changes in early FracFocus records to obscure the use of diesel in fracking operations.  Diesel is just about the ONLY chemical that is regulated in fracking.  Without a record of what changed, we must completely trust the companies.

So, I have started comparing archives –  a current one with an earlier one – to see if previously published data has been silently changed.  It is not clear at all that I will find anything of interest.  I can only look at changes since I started saving raw downloads (late 2018), and I suspect that most of the changes will be operators making minor changes to publications instead of creating a whole new disclosure.  Still, it gives me a little more confidence that we can shine more light onto the processes of this industry-sponsored operation and overcome some of its built-in weaknesses.

Leave a Reply

Your email address will not be published. Required fields are marked *