You know the old story.

In this new story, we have the NIH’s Sex As a Biological Variable (SABV) policy. When first discussed, just about everyone who took this seriously pointed out the problem of a zero sum, limited funding system adopting a mandate which would double the animal costs. To really consider SABV properly, we said, this is going to double our sample sizes…at the very least. Probably more than double.

That is coming from the perspective of a scientist who works with units of the whole experimental animal. There are many of us.

The official NIH response was a bunch of gaslighting.

“Oh no”, went the policy mavens of the NIH, “this is not what this means at all. Simply include equal numbers of male and female animals at your regular sample size. That’s it. Oh, yeah, you have to say you will stratify your data by sex and look at it. You know, just in case there’s anything there. But nothing insists you have to double your sample size.”

Sure, said we NIH watchers/applicants. Sure it will go like that. Have you met our reviewers? They are going to first of all demand that every study is fully powered to detect any sex difference. Then, they are going to immediately start banging on about swabbing and cycling the female rats and something something about powering up for cycle as well.

NIH: “No, of course not that would never happen why we will tell them not to do that and everything will be copacetic”

Things were not copacetic. As predicted, reviewers of grants have, since even before the mandate went into effect, demonstrated they are constitutionally unable to do what NIH claimed they should be doing and in fact do what they were predicted in advance to do. Make everything HAVE to be a sex differences study and HAVE to be a study of estrous cycle. Randomly. Variable. Yes. As with everything in NIH review. And who knows, maybe this is a selective cudgel (I call it Becca’s Bludgeon) used only when they just generally dislike the proposal.

The NIH mandate let the SABV camel’s nose under the tentflap and now that camel is puuuuuuuuussssshhhhing all the way in.

A new article in eLife by Garcia-Sifuentes and Maney is part of this campaign. It is chock full of insinuations and claims trying to justify the full camel in side the tent. Oh, they know perfectly well what the NIH policy was. But they are using all of the best #allegedprofession techniques to try to avoid admitting they are fully doing an end run.

From the Abstract: This new policy has been interpreted by some as a call to compare males and females with each other.

From the Intro: Although the NIH policy does not explicitly require that males and females be compared directly
with each other, the fact that more NIH-funded researchers must now study both sexes should lead to an increase in the frequency of such comparisons (insert self-citation). For example, there should be more testing for sex-specific
responses

“should”.

although the proportion of articles that included both sexes significantly increased (see also Will et al., 2017), the proportion that treated sex as a variable did not. [Note interesting goalpost move. or at least totally undefined insinuation] This finding contrasts sharply with expectations [whose “expectations” would those be?], given not only the NIH mandate but also numerous calls over the past decade to disaggregate all preclinical data by sex [yes, the mandate was to disaggregate by sex. correct.] and to test for sex differences [bzzzt, nope. here’s another slippery and dishonest little conflation]

One potential barrier to SABV implementation is a lack of relevant resources; for example, not all researchers have received training in experimental design and data analysis that would allow them to test for sex differences using appropriate statistical approaches. [oh what horseshit. sure, maybe there is a terrible lack of experimental design training. I agree those not trained in experimental psychology seem to be a bit lacking. But this is not specific to sex differences. A group is a group is a group. so is a factor. the “lack of relevant resources” is….money. grant money.]

any less-than-rigorous test for sex differences creates risk for misinterpretation of results and dissemination of misinformation to other scientists and to the public [There you have it. The entire NIH scheme to introduce SABV is not only flawed, it is, seemingly, even worse than doing nothing!]

Although a sex difference was claimed in a majority of articles (57%), not all of these differences were supported with statistical evidence. In more than a quarter of the articles reporting a sex difference, or 24/83 articles, the sexes were never actually compared statistically. [Yep, totally consistent with the assertions from NIH about what they were after. Anything else is a significant move of the goalposts. In the direction that was anticipated and EXPLICITLY denied as being the goal/end game by the NIH. In oh so many ways.]

In these cases, the authors claimed that the sexes responded differentially to a treatment when the effect of treatment was not statistically compared across sex. … Of the studies with a factorial design, 58% reported that the sexes responded differently to one or more other factors. The language used to state these conclusions often included the phrase ‘sex difference’ but could also include ‘sex-specific effect’ or that a treatment had an effect ‘in males but not females’ or vice versa. … Neither approach tests whether the treatment had different effects in females and males. Thus, a substantial majority of articles containing claims of sex-specific effects (70%) did not present statistical evidence to support those claims

This is also, utter a-scientific horseshit.

I get this a lot from reviewers so I’m going to expand but only briefly. There is no such thing as canonical statistical interpretation techniques that are either “right” or “wrong”. Nor do statistical inference techniques alter the outcome of a study. The data are what they are. All else is shades of interpretation. At the very best you could say that different inferential statistical outcomes may mean there is stronger or weaker evidence for your interpretations of the data. at best.

But there is a broader hypocrisy here. Do you only build your knowledge within the context of one paper? Do you assemble your head space on whether something is likely or unlikely to be a valid assertion (say, “female rats self-administer more cocaine”) ONLY on papers that provide like-to-like perfectly parallel and statistically compared groups?

If you are an idiot, I suppose. Not being an idiot, I assert that most scientists build their opinions about the world of science that they inhabit on a pile of indirectly converging evidence. Taking variability in approach into account. Stratifying the strength of the evidence to their best ability. Weighting the results. Adding each new bit of evidence as they come across it.

And, in a scenario where 10 labs were conducting cocaine self-administration studies and five each tended to work on males and females independently, we would conclude some things. If we were not preening Experimental Design Spherical Cow 101 idiots. If, for example, no matter the differences in approach it appeared that in aggregate the females self-administered twice as many infusions of the cocaine.

We would consider this useful, valid information that gives us the tentative idea that perhaps there is a sex difference. We would not hold our hands over our eyes mumbling “blah blah blah I can’t hear you either” and insist that there is zero useful indication from this true fact. We would, however, as we do with literally every dataset, keep in mind the limitations of our inferences. We might even use these prior results to justify a better test of the newly developed hypothesis, to overcome some of the limitations.

That is how we build knowledge.

Not by insisting if a comparison of datasets/findings does not accord with strict ideas of experimental design rigor, it is totally invalid and meaningless.

Among the articles in which the sexes were pooled, the authors did so without testing for a sex difference almost half of the time (48%; Figure 3B). When authors did test for a sex difference before pooling, they sometimes found a significant difference yet pooled the sexes anyway; this occurred in 17% of the articles that pooled.[Yes, consistent with the NIH policy. Again with the moving the goalposts….]

Thus, the authors that complied with NIH guidelines to disaggregate data usually went beyond NIH guidelines to explicitly compare the sexes with each other. [hookay…..so where’s the problem? isn’t this a good thing?]

How many times have you heard another academic scientist say “I rejected that manuscript…“. Or, “I accepted that manuscript….“? This is usually followed by some sort of criticism of an outcome for that manuscript that is inconsistent with their views on what the disposition should be. Most often ” I rejected that manuscript…but it was accepted for publication anyway, how dare they??!!??”

We somewhat less often hear someone say they “rejected” or “funded” a grant proposal…but we do hear disappointed applicants claim that one reviewer “killed my grant”.

This is, in general, inaccurate.

All, and I mean ALL, of the review input on NIH grants that takes place from receipt and referral through to the Advisory Council input (and whatever bean counting tetris puzzle fitting happens post-Council) is merely advisory to the Director. The IC Director is the deciderer.

Similarly, all peer review input to manuscripts is merely advisory to the Editor. In this case, there may be some variability in whether it is all being done at the Editor in Chief level, to what extent she farms that out to the handling sub-Editors (Associate, Senior, Reviewing, etc) or whether there is a more democratic discussion amongst a group of deciding editors.

What is clear, however, is that the review conducted by peers is merely advisory.

It can be the case that the deciding editor (or editorial process) sides with a 2-1 apparent vote. It could be siding with a 1-2 vote. Or overruling a 0-3 vote. Either for or against acceptance.

This is the process that we’ve lived with for decades. Scientific generations.

Yet we still have people expressing this bizarre delusion that they are the ones “accepting” or “rejecting” manuscripts in peer review. Is this a problem? Would it be better, you ask, if we all said “I recommended against accepting it”?

Yes. It would be better. So do that.

This post is brought to you by a recent expression of outrage that a paper was rejected despite (an allegation of) positive-sounding comments from the peer reviewers. This author was so outraged that they contacted some poor fool reviewer who had signed their name to the review. Outside of the process of review, the author demanded this reviewer respond. Said reviewer apparently sent a screen shot of their recommendation for, well, not rejection.

This situation then usually goes into some sort of outrage about how the editorial decision making process is clearly broken, unethical, dishonest, political….you know the drill. Bad.

For some reason we never hear those sorts of complaints from the authors when an editor has overruled the disfavorable reviewers and issued an acceptance for publication.

No, in those cases we hear from the outraged peer reviewer. Who also, on occasion, has been know to rant about how the editorial decision making process is clearly broken, unethical, dishonest, political….you know the drill. Bad.

All because we have misconstrued the role of peer review.

It is advisory. That is all.