A tweet from Potnia Theron alerts us to a change in the way the CSR study sections at NIH will review grants for this round and into the future. A tweet from our good blog friend @boehninglab confirms a similar story from the June rounds. I am very pleased.

When I first started reviewing NIH grants, the grants were ordered for discussion in what appeared to be clusters of grants assigned to the same Program Officer or at least a given Program Branch or Division. This was back when a substantial number of the POs would attend the meeting in person to follow the discussion of the grants which might be of interest to them to fund. Quite obviously, it would be most efficient and reasonable for a SRO to be able to tell a PO that their grants would all be discussed in, e.g., a two hour contiguous and limited time interval instead of scattered randomly across a two day meeting interval.

Importantly, this meant that grants were not reviewed in any particular order with respect to pre-meeting scores and the grants slated for triage were ordered along with everything that was slated to be discussed.

When we shifted to reviewing grants in ascending order of preliminary score (i.e., best to worst) I noticed some things that were entirely predictable*. These things had a quelling effect on score movement through the discussion process for various reasons. Now I do say “noticed“. I have not seen any data from the CSR on this and would be very interested to see some before / after for the prior change and for the current reversion. So I cannot assert any strong position that indeed my perceptions are valid.

This had the tendency to harden the very best scores. Which, btw, were the ones almost guaranteed to fund since this came along during a time of fixed budget and plummeting paylines. Still, the initial few projects were not as subject to…calibration…as they may have been before. When you are facing the first two proposals in the round, it’s easy for everyone to nod along with the reviewers who are throwing 2s and saying the grant is essentially perfect. When you get such a beast in day 2 when you’ve already battled through a range of issues…..it’s more likely someone is going to say “yeah but whattabout….?”

It’s axiomatic that there is no such thing as an unassailable “perfect” grant proposal. Great scores arise not because the reviewers can find no flaws but because they have chosen to overlook or downplay flaws that might have been a critical point of discussion for another proposal. The way the NIH review works, there is no re-visitation of prior discussions just because someone realizes that the issue being used to beat the heck out of the current application also applied to the one discussed five grants ago that was entirely downplayed or ignored. This is why, fair or not, discussion tends to get more critical as the meeting goes on. So in the old semi-random order, apps that had good and bad preliminary scores were equally subject to this factor. In the score-ordered era, the apps with the best preliminary scores were spared this effect.

Another factor which contributed to this hardening of the preliminary score order is the “why bother?” factor. Reviewers are, after all, applicants and they are sensitive to the perceived funding line as it pertains to the scores. They have some notion of whether the range of scores under current discussion means “this thing is going to fund unless the world explodes“, “this thing is going to be a strong maybe and is in the hunt for gray zone pickup” or “no way, no how is this going to fund unless there is some special back scratching going on“. And believe you me they score accordingly despite constant admonishment to use the entire range and that reviewers do not make funding decisionsTM.

When I was first on study section the SRO sent out scoring distribution data for the prior several rounds and it was awesome to see. The score distribution would flatten out (aka cluster) right around the operative perceived score line at the time. The discussions would be particularly fierce around that line. But since an app at any given score range could appear throughout the meeting there was motivation to stay on target, right through to the last app discussed at times. With the ordered review, pretty much nothing was going to matter after lunch on the first day. Reviewers were not making distinctions that would be categorically relevant after that point. Why bother fighting over precisely which variety of unfundable score this app receives? So I argue that exhaustion was more likely to amplify score hardening.

I don’t have any data for that but I bet the CSR does if they would care to look.

These two factors hit the triage list in a double whammy.

To recap, anyone on the panel (and not in conflict) can request that a grant slated not to be discussed be raised for discussion. For any reason.

In the older way of doing things, the review order would include grants scheduled for triage, the Chair would come to it and just say that it was triaged and ask if anyone wanted to discuss it. Mostly everyone just enters ND on the form and goes on to the next one. However sometimes a person wanted to bring it up out of triage and discuss it.

You can see that if this was in order of the third proposal on the first day that the psychology of pulling it up would differ from if it were an application scheduled last in the meeting on day 2 when everyone is eager to rush to the airport.

In the score order way of doing things, this all came at the end. When the mind of the reviewer was already on early flights and had sat through many hours of “why are we discussing this one when it can’t possibly fund”. The pressure not to pull up any more grants for discussion was severe. My perception is that the odds of being pulled up for discussion went way, way, way down. I bet CSR has data on that. I’d like to see it.

I don’t have full details if the new policy of review order will include triaged apps or be a sort of hybrid. But I hope it returns to scheduling the triaged apps right along with everything else so that they have a fairer chance to be pulled up for discussion.


*and perhaps even intentional. There were signs from Scarpa, the CSR Director at the time, that he was trying to reduce the number of in-person meetings (which are very expensive). If scores did not change much (and if the grants selected for funding did not change) between the pre-meeting average of three people and the eventual voted score, then meetings were not a good thing to have. Right? So a suspicious person like myself immediately suspected that the entire goal of reviewing grants in order of initial priority score was to “prove” that meetings added little value by taking structural steps to reduce score movement.


September 9, 2019

I’ve already lost the thread to it but some friend of Joi Ito, the MIT Media Lab guy who took Epstein’s money, was recently trying to defend his actions. If I caught the gist of the piece, it was that Ito allegedly really believed that Epstein ad been reformed, or at least had been sufficiently frightened by his legal consequences not to re-offend with his raping of children.

I want to get past the question of whether Ito was disingenuous or so blinded by what he wanted (Epstein’s money) that he was willing to fool himself. I want to address the issue of forgiveness. Because even if Ito genuinely believed Epstein was reformed, scared and would never in a million years offend again…he had to forgive him for his past actions.

I was pondering this on my commute this morning.

I do not forgive.

I only rarely forget.

I hold grudges for decades.

I have been known to ruminate and dwell and to steep.

I am trying my best to come up with cases where I’ve suffered a significant harm or insult from someone and managed to forgive them at a later date. I’m not recalling any such thing.

On the other hand, nobody has ever offered me millions of dollars to overlook their past behavior, either.

BJP issues new policy on SABV

September 4, 2019

The British Journal of Pharmacology has been issuing a barrage of initiatives over the past few years that are intended to address numerous issues of scientific meta-concern including reproducibility, reliability and transparency of methods. The latest is an Editorial on how they will address current concerns about including sex as a biological variable.

Docherty et al. 2019 Sex: A change in our guidelines to authors to ensure that this is no longer an ignored experimental variable. https://doi.org/10.1111/bph.14761 [link]

I’ll skip over the blah-blah about why. This audience is up to speed on SABV issues. The critical parts are what they plan to do about it, with respect to future manuscripts submitted to their journal. tldr: They are going to shake the finger but fall woefully short of heavy threats or of prioritizing manuscripts that do a good job of inclusion.

From Section 4 BJP Policy: The British Journal of Pharmacology has decided to rectify this neglect of sex as a research variable, and we recommend that all future studies published in this journal should acknowledge consideration of the issue of sex. In the ideal scenario for in vivo studies, both sexes will be included in the experimental design. However, if the researcher’s view is that sex or gender is not relevant to the experimental question, then a statement providing a rationale for this view will be required.

Right? Already we see immense weaseling. What rationales will be acceptable? Will those rationales be applied consistently for all submissions? Or will this be yet another frustrating feature for authors in which our manuscripts appear to be rejected on grounds that other papers published seem to suffer from?

We acknowledge that the economics of investigating the influence of sex on experimental outcomes will be difficult until research grant‐funding agencies insist that researchers adapt their experimental designs, in order to accommodate sex as an experimental variable and provide the necessary resources. In the meantime, manuscripts based on studies that have used only one sex or gender will continue to be published in BJP. However, we will require authors to include a statement to justify a decision to study only one sex or gender.

Oh a statement. You know, the NIH has (sort of, weaselly) “insisted”. But as we know the research force is fighting back, insisting that we don’t have “necessary resources” and, several years into this policy, researchers are blithely presenting data at conferences with no mention of addressing SABV.

Overall sex differences and, more importantly, interactions between experimental interventions and sex (i.e., the effect of the intervention differs in the two sexes) cannot be inferred if males and females are studied in separate time frames.

Absolutely totally false. False, false, false. This has come up in more than one of my recent reviews and it is completely and utterly, hypocritically wrong. Why? Several reasons. First of all in my fields of study it is exceptionally rare that large, multi-group, multi-sub-study designs (in single sex) are conducted this way. It is resource intensive and generally unworkable. Many, many, many studies include comparisons across groups that were not run at the same time in some sort of cohort balancing design. And whaddaya know those studies often replicate with all sorts of variation across labs, not just across time within lab. In fact this is a strength. Second, in my fields of study, we refer to prior literature all the time in our Discussion sections to draw parallels and contrasts. In essentially zero cases do the authors simply throw up their hands and say “well since nobody has run studies at the same time and place as ours there is nothing worth saying about that prior literature”. You would be rightfully laughed out of town.

Third concern: It’s my old saw about “too many notes“. Critique without an actual reason is bullshit. In this case you have to say why you think the factor you don’t happen to like for Experimental Design 101 reasons (running studies in series instead of parallel) has contributed to the difference. If one of my peer labs says they did more or less the same methods this month compared to last year compared to five years ago…wherein lies the source of non-sex-related variance which explains why the female group self-administered more cocaine compared with the before, after and in between male groups which all did the same thing? And why are we so insistent about this for SABV and not for the series of studies in males that reference each other?

In conscious animal experiments, a potential confounder is that the response of interest might be affected by the close proximity of an animal of the opposite sex. We have no specific recommendation on how to deal with this, and it should be borne in mind that this situation will replicate their “real world.” We ask authors merely to consider whether or not males and females should be physically separated, to ensure that sight and smell are not an issue that could confound the results, and to report on how this was addressed when carrying out the study. Obviously, it would not be advisable to house males and females in different rooms because that would undermine the need for the animals to be exposed to the same environmental factors in a properly controlled experiment.


Look, there are tradeoffs in this SABV business when it comes to behavior studies, and no doubt others. We have many sources of potential variance that could be misinterpreted as a relatively pure sex difference. We cannot address them all in each and every design. We can’t. You would have to run groups that were housed together, and not, in rooms together and not, at times similar and apart AND MULTIPLY THAT AGAINST EACH AND EVERY TREATMENT CONDITION YOU HAVE PLANNED FOR THE “REAL” STUDY.

Unless the objective of the study is specifically to investigate drug‐induced responses at specific stages of the oestrous cycle, we shall not require authors to record or report this information in this journal. This is not least because procedures to determine oestrous status are moderately stressful and an interaction between the stress response and stage of the oestrous cycle could affect the experimental outcome. However, authors should be aware that the stage of the oestrous cycle may affect response to drugs particularly in behavioural studies, as reported for actions of cocaine in rats and mice (Calipari et al., 2017; Nicolas et al., 2019).

Well done. Except why cite papers where there are oestrous differences without similarly citing cases where there are no oestrous differences? It sets up a bias that has the potential to undercut the more correct way they start Section 5.5.

My concern with all of this is not the general support for SABV. I like that. I am concerned first that it will be toothless in the sense that studies which include SABV will not be prioritized and some, not all, authors will be allowed to get away with thin rationales. This is not unique to BJP, I suspect the NIH is failing hard at this as well. And without incentives (easier acceptance of manuscripts, better grant odds) or punishments (auto rejects, grant triages) then behavior won’t change because the other incentives (faster movement on “real” effects and designs) will dominate.