Posted on 26 September 2017 in Pietro's Data Bulletin

TL;DR: Low statistical power, stemming from small sample sizes, is a serious concern for the reliability of results in neuroscience. But not all hope is lost.

All of you are probably familiar with the concept of statistical power (i.e. the ability of a study to statistically detect an existing effect) so I’m sure no one will be surprised to find out that studies with low power are undesirable because they are more likely to miss the experimental effect that they are looking for. But before you mark this post as ‘boring’ and move on with your day, you should know that Katherine Button and colleagues, in their Nature Reviews paper [1], outline two less known and perhaps more dangerous consequences of running studies with low statistical power.

Firstly, lower power leads to lower positive predictive value (which is the probability that a finding reflects a true effect rather than a false positive): that is, lower power equals more chances of finding significant results which are just false positives, and lack a true underlying effect. Secondly, even when an underpowered study is able to detect a true effect, the magnitude of this effect will likely be overestimated, an effect which is known as the ‘winner’s curse’ (this is particularly unfortunate because researchers trying to replicate such a finding are going to be much more likely to fail). In addition, the authors argue that low-powered studies are more vulnerable to a range of other bias, such as vibration effects (i.e. changing your analysis method even slightly might have a sizeable impact on your results), publication bias, and selective reporting of outcomes.

The most striking conclusion is that the authors draw is that the likelihood that nominally significant findings in the literature correspond to true effects is actually scaringly small, and this may be especially true in neuroscience, a field in which statistical power tends to be quite low, as the authors show by a relatively comprehensive review. In the context of animal model studies, this has also ethical implications: animal sacrifices are more difficult to justify if the resulting study is weakened to such an extent by lack of statistical power.

The authors conclude by presenting a few general recommendation, broadly aimed at creating a scientific environment which values and enables replication, transparency, meta-analyses, and collaboration. Their statistical guidelines include an a priori calculation of your statistical power (although estimating effect sizes is admittedly no piece of cake) and pre-registration of the confirmatory analysis plan. The latter deals with the important difference between exploratory and confirmatory data analysis, and the fact that p-values generated during exploratory analysis do not have the same interpretation as confirmatory p-values (on this issue, I recommend this short article, and to always keep in mind that if you run many exploratory tests on your data and find one that is significant with p<0.05, that p-value might in fact be pretty close to meaningless).

In light of all this, I think there’s only one solution: power to the people!


[1] Button, Katherine S., et al. "Power failure: why small sample size undermines the reliability of neuroscience." Nature Reviews Neuroscience 14.5 (2013): 365-376.