I use many statistical tools, but I am not a statistician. Although I took several stats courses in college and graduate school, I’m not familiar enough with statistical theory to understand what a complex analysis will do with a given data set beyond simple situations.
It’s a problem. In my field, ecologists are primarily interested in biological systems and the complex interactions of species and their environment. Experiments that address interesting problems can get complicated very quickly. In graduate school, I saw many ecology students apply very complicated analyses to their complex experimental designs, often because they knew it would be expected by their committee and journal reviewers. You might say that “a sufficient understanding of the analysis was often lacking.”
I’m speaking from personal experience, of course. (There are many mathematically agile ecologists, including a solid bunch in my grad program.) In my main experiments, I had to manipulate the diversity of species in a natural community and then follow how the community responded to different kinds of stresses. At the time it was a novel experiment for fieldwork and I was absorbed by how to accomplish it. I didn’t carefully think about the statistical design until the data started rolling in. (Big Mistake, but I’m sure I’m not alone.)
I got some advice how to design the analysis, but I struggled to understand how to interpret its results. “Struggled” isn’t quite the word; I was sure I was a failure. If I couldn’t understand my analysis, how could I possibly defend my work?
But then I took the “Community Analysis” course by Bruce McCune. Bruce is one of those math-savvy ecologists who also has a knack for explaining how things work to the math-not-so-savvy. The content of the course was multivariate analysis with an ecological focus. But the big take-home lesson for me was: explore your analysis tools however you can until you understand what they will do for you. For me, that meant simulating all kinds of data, running that through the technique we were learning, and watching what it spit out. That was something I could do! (Thank you, Dr. McCune!)
This lead me to a series of tests of not only my analysis, but also the kinds of experimental designs ecologists were beginning to use to address the questions of function of diversity. It was fascinating to me because it showed how easy it was to fool ourselves that we were seeing one thing from a particular experiment, when it could easily be something else, hidden by our choice of experimental design and analysis.
A few years after graduating at Oregon State, I inherited a large survey for which the analysis was intrinsically built into the sample design – the way it should be. It was a complex analysis, the data were messy, and it was a massive effort for the field crew. I was excited to be part of the project because of its scale and promise to yield vast patterns. But the analysis was difficult for me to wrap my head around – a design to estimate the dominant spatial scale of variability using a nested ANOVA. So I did what I knew how to do, I created simulated data sets with different scales of variability and applied them to the analysis scheme. And sure enough, variability in one level was unexpectedly showing up in other levels in the results, often to a surprising degree. When I posted my results to a forum of stats experts (R-help), I received validation: sure, without lots of replication, you would expect that to happen. These complex analyses won’t work miracles with low power data. (Thank you, Dr. Lumley!) The data we were collecting were valuable for lots of reasons, but probably not for identifying the dominant scale of variability.
I have found that using simulations of expected patterns and testing an analysis scheme with those data have helped me gain almost an intuitive understanding of what an analysis is really telling me. As someone who depends on complicated analysis as a tool, such exploration has become invaluable.