The BJP has decided to require that manuscripts submitted for publication adhere to certain experimental design standards. The formulation can be found in Curtis et al., 2015.

Curtis MJ, Bond RA, Spina D, Ahluwalia A, Alexander SP, Giembycz MA, Gilchrist A, Hoyer D, Insel PA, Izzo AA, Lawrence AJ, MacEwan DJ, Moon LD, Wonnacott S, Weston AH, McGrath JC. Experimental design and analysis and their reporting: new guidance for publication in BJP. Br J Pharmacol. 2015 Jul;172(14):3461-71. doi: 10.1111/bph.12856 [PubMed]

Some of this continues the “huh?” response of this behavioral pharmacologist who publishes in a fair number of similar journals. In other words, YHN is astonished this stuff is not just a default part of the editorial decision making at BJP in the first place. The items that jump out at me include the following (paraphrased):

2. You should shoot for a group size of N=5 or above and if you have fewer you need to do some explaining.
3. Groups less than 20 should be of equal size and if there is variation from equal sample sizes this needs to be explained. Particularly for exclusions or unintended loss of subjects.
4. Subjects should be randomized to groups and treatment order should be randomized.
6.-8. Normalization and transformation should be well justified and follow acceptable practices (e.g., you can’t compare a treatment group to the normalization control that now has no variance because of this process).
9. Don’t confuse analytical replicates with experimental replicates in conducting analysis.

Again, these are the “no duh!” issues in my world. Sticky peer review issues quite often revolve around people trying to get away with violating one or other of these things. At the very least reviewers want justification in the paper, which is a constant theme in these BJP principles.

The first item is a pain in the butt but not much more than make-work.

1. Experimental design should be subjected to ‘a priori power analysis’….latter requires an a priori sample size calculation that should be included in Methods and should include alpha, power and effect size.

Of course, the trouble with power analysis is that it depends intimately on the source of your estimates for effect size- generally pilot or prior experiments. But you can select basically whatever you want as your assumption of effect size to demonstrate a range of sample sizes as acceptable. Also, you can select whatever level of power you like, within reasonable bounds along the continuum from “Good” to “Overwhelming”. I don’t think there are very clear and consistent guidelines here.

The fifth one is also going to be tricky, in my view.

Assignment of subjects/preparations to groups, data recording and data analysis should be blinded to the operator and analyst unless a valid scientific justification is provided for not doing so. If it is impossible to blind the operator, for technical reasons, the data analysis can and should be blinded.

I just don’t see how this is practical with a limited number of people running experiments in a laboratory. There are places this is acutely important- such as when human judgement/scoring measures are the essential data. Sure. And we could all stand to do with a reminder to blind a little more and a little more completely. But this has disaster written all over it. Some peers doing essentially the same assay are going to disagree over what is necessary and “impossible” and what is valid scientific justification.

The next one is a big win for YHN. I endorse this. I find the practice of reporting any p value other than your lowest threshold to be intellectually dishonest*.

10. When comparing groups, a level of probability (P) deemed to constitute the threshold for statistical significance should be defined in Methods, and not varied later in Results (by presentation of multiple levels of significance). Thus, ordinarily P < 0.05 should be used throughout a paper to denote statistically significant differences between groups.

I’m going to be very interested to see how the community of BJP accepts* this.

Finally, a curiosity.

11. After analysis of variance post hoc tests may be run only if F achieves the necessary level of statistical significance (i.e. P < 0.05) and there is no significant variance in homogeneity.

People run post-hocs after a failure to find a significant main effect on the ANOVA? Seriously? Or are we talking about whether one should run all possible comparison post-hocs in the absence of an interaction? (seriously, when is the last time you saw a marginal-mean post-hoc used?) And isn’t this just going to herald the return of the pre-planned comparison strategy**?

Anyway I guess I’m saying Kudos to BJP for putting down their marker on these design and reporting issues. Sure I thought many of these were already the necessary standards. But clearly there are a lot of people skirting around many of these in publications, specifically in BJP***. This new requirement will stiffen the spine of reviewers and editors alike.

*N.b. I gave up my personal jihad on this many years ago after getting exactly zero traction in my scientific community. I.e., I had constant fights with reviewers over why my p values were all “suspiciously” p<0.5 and no backup from editors when I tried to slip this concept into reviews.

**I think this is possibly a good thing.

***A little birdy who should know claimed that at least one AE resigned or was booted because they were not down with all of these new requirements.