Should I overhaul Open-FF?

I am contemplating an overhaul of the Open-FF system. This would include:

  • cleanup and simplifying the code
  • improve the documentation
  • add unit testing, so code changes don’t introduce problems to older code
  • improve coverage of chemical mass
  • create hooks for more features such as record correction

This will be lots of work and a significant amount of time.  Before I head into this I want to more explicitly outline what FracFocus (as it is) can  and can’t do for us.

Obstacles

From all of my interactions with FracFocus, I found a host of obvious obstacles to transparency about chemicals in the industry:

  • FracFocus version 1 has no chemical records in the bulk download
  • Proprietary claims are allowed, and often buried in MSDS
  • The ‘systems approach’ breaks the connection between chemical identity and supplier and trade-name product identity, not just on proprietary chemicals but ALL chemicals in a disclosure
  • There is no easy way to determine oilfield service companies that provided for the fracking job.
  • There is no audit trail, there is no record of when a disclosure was published or who published it.
  • There are few checks of data quality (either values or presence).
  • Silent changes are allowed
  • There appears to be no publishing deadline – many are published more than a year after the work is completed.
  • There is no way to determine the “final” disclosure when there are duplicates.  Some may even be partial.
  • FracFocus takes no responsibility for data entered – always refers specific questions to the companies entering the data.
  • Companies are almost universally unresponsive about problems brought to their attention.  For the most part, there is not a clear way for members of the public to alert companies.
  • The number of companies and different actors withing companies is very large.  It is clear there is no “one way to do it” approach to the FracFocus data and different approaches may be contradictory.

To me, this all means that FracFocus provides companies with huge ambiguity and therefore, FracFocus will be useless for any legal challenges.  It has many built-in avenues of deniability.

There is another big ambiguity: my lack of direct knowledge of the industry and all their procedures and my lack of ability to ask for clarification.

Uses?

Given all these problems, what good is FracFocus?

First of all, it is by far the biggest source of public data on fracking chemicals.  Last count put the number of disclosures over 175,000 and number of individual chemical records is over 6 million.  Even if there is a lot of ambiguity, with that much data, larger pictures start to emerge. Pictures about the vast number of chemicals used and the large ranges of quantities; sometimes at eye-popping values.  Patterns of water use – which continues to climb.  Examples of poor disclosure, of chemical hiding techniques, of company hypocrisy.  Patterns of how the industry is changing.  Suggestions on what some of the hidden chemicals are.

As long as we remember that the data will always be less than perfect, have a degree of ambiguity and subject to industry denials because they were so sloppy, there is a lot we can get from FracFocus.

Is that enough for the potential users?  For me, that is still the open question.  Academics want data to be immaculate and if it can’t be that, the ambiguity has to be well documented.  The little interaction I’ve had with state folks left the impression that they don’t use the data for much of anything – just documentation; as long as a disclosure is published, that box is checked.  Activists and public health advocates seem to be overwhelmed with the crazy complexity of these data: the number and complexity of chemicals, range of uses, the multi-layered hiding.  It is frustrating to them.

Open-FF?

Given all that, what can Open-FF offer to help the situation?  I think the one thing major thing that Open-FF can offer (though I am not sure it is there yet) is complete transparency. Opening up the FracFocus black box.  That means, making sure it is easy to understand what I am doing to FF data, making sure the ambiguities of FF are well delineated, the crap of FF clearly spelled out but also that the usefulness of the data established.  But how can I be a source of transparency when I don’t have access to even getting questions answered?

How do I evaluate that for Open-FF?  I guess one big thing would be to move it to a clean-enough form that anyone with python background could take it over.  That a group like EDGI would be able to evaluate it.

Bottom line

FracFocus has always bothered me. That it is a “transparency instrument” that manages to almost completely undermine the public’s knowledge of chemical.  That the industry actually takes credit for being transparent with it.  That its existence largely scuttles serious government oversight.

The public deserves to be able to find out.  If that is my contribution, just adding a bit of transparency to the situation, I’d be happy.

Simple addition to a Jupyter table

A colleague recently commented on a report that I generated that it would be way more user-friendly to make the tables interactive.   Sigh.  Of  course, he is right.  When you come across big tables, most people will just scroll right past – even if they are interested in the content.  It is just too much trouble to digest.

Based on his suggestion, I went on a search for HTML or JavaScript tools that might help me do something like that.  What I stumbled upon has been super useful not just for reports, but just about every Jupyter script I write.

It is the itables module for Python and pandas.  By just adding a few lines of code, in every dataframe you display, column names can be clicked to sort them, tables can be broken in to pages, and an awesome search bar is provided to make filtering on the fly a snap.