NIH data on Discussion Rates and Grant Submitting Vigor
July 20, 2022
The latest blog post over at Open Mike, from the NIH honcho of extramural grant award Mike Lauer, addresses “Discussion Rate”. This is, in his formulation, the percent of applicants (in a given Fiscal Year, FY21 in this case) who are PI on at least one application that reaches discussion. I.e., not triaged. The post presents three Tables, with this Discussion rate (and Funding rate) presented by the Sex of the PI, by race (Asian, Black, White only) or ethnicity (Hispanic or Latino vs non-Hispanic only). The tables further presented these breakdowns by Early Stage Investigator, New Investigator, At Risk and Established. At risk is a category of “researchers that received a prior substantial NIH award but, as best we can tell, will have no funding the following fiscal year if they are not successful in securing a competing award this year.” At this point you may wish to revisit an old blog post by DataHound called “Mind the Gap” which addresses the chances of regaining funding once a PI has lost all NIH grants.
I took the liberty of graphing the By-Race/Ethnicity Discussion rates, because I am a visual thinker.

There seem to be two main things that pop out. First, in the ESI category, the Discussion rate for Black PI apps is a lot lower. Which is interesting. The 60% rate for ESI might be a little odd until you remember that the burden of triage may not fall on ESI applications. At least 50% have to be discussed in each study section, small numbers in study section probably mean that on average it is more than half, and this is NIH wide data for FY 21 (5,410 ESI PIs total). Second, the NI category (New, Not Early on the chart) seems to suffer relative to the other categories.
Then I thought a bit about this per-PI Discussion rate being north of 50% for most categories. And that seemed odd to me. Then I looked at another critical column on the tables in the blog post.
The Median number of applications per applicant was…. 1. That means the mode is 1.
Wow. Just….wow.
I can maybe understand this for ESI applicants, since for many of them this will be their first grant ever submitted.
but for “At Risk”? An investigator who has experience as a PI with NIH funding, is about to have no NIH funding if a grant does not hit, and they are submitting ONE grant application per fiscal year?
I am intensely curious how this stat breaks down by deciles. How many at risk PIs are submitting only one grant proposal? Is it only about half? Two-thirds? More?
As you know, my perspective on the NIH grant getting system is that if you have only put in one grant you are not really trying. The associated implication is that any solutions to the various problems that the NIH grant award system might have that are based on someone not getting their grant after only one try are not likely to be that useful.
I just cannot make this make sense to me. Particularly if the NIH
It is slightly concerning that the NIH is now reporting on this category of investigator. Don’t get me wrong. I believe this NIH system should support a greater expectation of approximately continual funding for investigators who are funded PIs. But it absolutely cannot be 100%. What should it be? I don’t know. It’s debatable. Perhaps more importantly who should be saved? Because after all, what is the purpose of NIH reporting on this category if they do not plan to DO SOMETHING about it? By, presumably, using some sort of exception pay or policy to prevent these at risk PIs from going unfunded.
There was just such a plan bruited about for PIs funded with the ESI designation that were unable to renew or get another grant. They called them Early Established Investigators and described their plans to prioritize these apps in NOT-OD-17-101. This was shelved (NOT-OD-18-214) because “NIH’s strategy for achieving these goals has evolved based on on-going work by an Advisory Committee to the Director (ACD) Next Generation Researchers Initiative Working Group and other stakeholder feedback” and yet asserted “NIH..will use an interim strategy to consider “at risk investigators”..in its funding strategies“. In other words, people screamed bloody murder about how it was not fair to only consider “at risk” those who happened demographically to benefit from the ESI policy.
It is unclear how these “consider” decisions have been made in the subsequent interval. In a way, Program has always “considered” at risk investigators, so it is particularly unclear how this language changes anything. In the early days I had been told directly by POs that my pleas for an exception pay were not as important because “we have to take care of our long funded investigators who will otherwise be out of funding”. This sort of thing came up in study section more than once in my hearing, voiced variously as “this is the last chance for this PIs one grant” or even “the PI will be out of funding if…”. As you can imagine, at the time I was new and full of beans and found that objectionable. Now….well, I’d be happy to have those sentiments applied to me.
There is a new version of this “at risk” consideration that is tied to the new PAR-22-181 on promoting diversity. In case you are wondering why this differs from the famously rescinded NINDS NOSI, well, NIH has managed to find themselves a lawyered excuse.
Section 404M of the Public Health Service Act (added by Section 2021 in Title II, Subtitle C, of the 21st Century Cures Act, P.L. 114-255, enacted December 13, 2016), entitled, “Investing in the Next Generation of Researchers,” established the Next Generation Researchers Initiative within the Office of the NIH Director. This initiative is intended to promote and provide opportunities for new researchers and earlier research independence, and to maintain the careers of at-risk investigators. In particular, subsection (b) requires the Director to “Develop, modify, or prioritize policies, as needed, within the National Institutes of Health to promote opportunities for new researchers and earlier research independence, such as policies to increase opportunities for new researchers to receive funding, enhance training and mentorship programs for researchers, and enhance workforce diversity;
“enacted December 13, 2016“. So yeah, the NOSI was issued after this and they could very well have used this for cover. The NIH chose not to. Now, the NIH chooses to use this aspect of the appropriations language. And keep in mind that when Congress includes something like this NGRI in the appropriations language, NIH has requested it or accepted it or contributed to exactly how it is construed and written. So this is yet more evidence that their prior stance that the “law” or “Congress” was preventing them from acting to close the Ginther Gap was utter horseshit.
Let’s get back to “at risk” as a more explicitly expressed concern of the NIH. What will these policies mean? Well, we do know that none of this comes with any concrete detail like set aside funds (the PAR is not a PAS) or ESI-style relaxation of paylines. We do know that they do this all the damn time, under the radar. So what gives? Who is being empowered by making this “consideration” of at-risk PI applications more explicit? Who will receive exception pay grants purely because they are at risk? How many? Will it be in accordance with distance from payline? How will these “to enhance diversity” considerations be applied? How will these be balanced against regular old “our long term funded majoritarian investigator is at risk omg” sentiments in the Branches and Divisions?
This is one of the reasons I like the aforementioned Datahound analysis, because at least it gave a baseline of actual data for discussion purposes. A framework a given I or C could follow in starting to make intelligent decisions.
What is the best policy for where, who, what to pick up?
Reconsidering “Run to Daylight” in the Context of Hoppe et al.
January 7, 2022
In a prior post, A pants leg can only accommodate so many Jack Russells, I had elucidated my affection for applying Vince Lombardi’s advice to science careers.
Run to Daylight.
Seek out ways to decrease the competition, not to increase it, if you want to have an easier career path in academic science. Take your considerable skills to a place where they are not just expected value, but represent near miraculous advance. This can be in topic, in geography, in institution type or in any other dimension. Work in an area where there are fewer of you.
This came up today in a discussion of “scooping” and whether it is more or less your own fault if you are continually scooped, scientifically speaking.
The trouble is, that despite the conceits in study section review, the NIH system does NOT tend to reward investigators who are highly novel solo artists. It is seemingly obligatory for Nobel Laureates to complain about how some study section panel or other passed on their grant which described the plans to pursue what became the Nobel-worthy work. Year after year a lot of me-too grants get funded while genuinely new stuff flounders. The NIH has a whole system (RFAs, PAs, now NOSI) set up to beg investigators to submit proposals on topics that are seemingly important but nobody can get fundable scores to work on.
In 2019 the Hoppe et al. study put a finer and more quantitatively backed point on this. One of the main messages was the degree to which grant proposals on some topics had a higher success rate and some on other topics had lower success rates. You can focus on the trees if you want, but the forest is all-critical. This has pointed a spotlight on what I have taken to calling the inherent structural conservatism of NIH grant review. The peers are making entirely subjective decisions, particularly right at the might-fund/might-not-fund threshold of scoring, based on gut feelings. Those peers are selected from the ranks of the already-successful when it comes to getting grants. Their subjective judgments, therefore, tend to reinforce the prior subjective judgments. And of course, tend to reinforce an orthodoxy at any present time.
NIH grant review has many pseudo-objective components to it which do play into the peer review outcome. There is a sense of fair-play, sauce for the goose logic which can come into effect. Seemingly objective evaluative comments are often used selectively to shore up subjective, Gestalt reviewer opinions, but this is in part because doing so has credibility when an assigned reviewer is trying to convince the other panelists of their judgment. One of these areas of seemingly objective evaluation is the PI’s scientific productivity, impact and influence, which often touches on publication metrics. Directly or indirectly. Descriptions of productivity of the investigator. Evidence of the “impact” of the journals they publish in. The resulting impact on the field. Citations of key papers….yeah it happens.
Consideration of the Hoppe results, the Lauer et al. (2021) description of the NIH “funding ecology” in the light of some of the original Ginther et al. (2011, 2018) investigation into the relationship of PI publication metrics is relevant here.
Publication metrics are a function of funding. The number of publications a lab generates depend on having grant support. More papers is generally considered better, fewer papers worse. More funding means an investigator has the freedom to make papers meatier. Bigger in scope or deeper in converging evidence. More papers means, at the very least, a trickle of self-cites to those papers. More funding means more collaborations with other labs…which leads to them citing both of you at once. More funding means more trainees who write papers, write reviews (great for h-index and total cites) and eventually go off to start their own publication records…and cite their trainee papers with the PI.
So when the NIH-generated publications say that publication metrics “explain” a gap in application success rates, they are wrong. They use this language, generally, in a way that says Black PIs (the topic of most of the reports, but this generalizes) have inferior publication metrics so this causes a lower success rate. With the further implication that this is a justified outcome. This totally ignores the inherent circularity of grant funding and publication measures of awesomeness. Donna Gither has written a recent reflection on her work on NIH grant funding disparity, which doubles down on her lack of understanding on this issue.
Publication metrics are also a function of funding to the related sub-field. If a lot of people are working on the same topic, they tend to generate a lot of publications with a lot of available citations. Citations which buoy up the metrics of investigators who happen to work in those fields. Did you know, my biomedical friends, that a JIF of 1.0 is awesome in some fields of science? This is where the Hoppe and Lauer papers are critical. They show that not all fields get the same amount of NIH funding, and do not get that funding as easily. This affects the available pool of citations. It affects the JIF of journals in those fields. It affects the competition for limited space in the “best” journals. It affects the perceived authority of some individuals in the field to prosecute their personal opinions about the “most impactful” science.
That funding to a sub-field, or to certain approaches (technical, theoretical, model, etc, etc) has a very broad and lasting impact on what is funded, what is viewed as the best science, etc.
So is it good advice to “Run to daylight”? If you are getting “scooped” on the regular is it your fault for wanting to work in a crowded subfield?
It really isn’t. I wish it were so but it is bad advice.
Better advice is to work in areas that are well populated and well-funded, using methods and approaches and theoretical structures that everyone else prefers and bray as hard as you can that your tiny incremental twist is “novel”.
The recent NOT-OD-21-073 Upcoming Changes to the Biographical Sketch and Other Support Format Page for Due Dates on or after May 25, 2021 indicates one planned change to the Biosketch which is both amusing and of considerable interest to us “process of NIH” fans.
For the non-Fellowship Biosketch, Section D. has been removed. … As applicable, all applicants may include details on ongoing and completed research projects from the past three years that they want to draw attention to within the personal statement, Section A.
Section D is “Additional Information: Research Support and/or Scholastic Performance“. The prior set of instructions read:
List ongoing and completed research projects from the past three years that you want to draw attention to. Briefly indicate the overall goals of the projects and your responsibilities. Do not include the number of person months or direct costs.
And if the part about “want to draw attention to” was not clear enough they also added:
Do not confuse “Research Support” with “Other Support.” Other Support information is not collected at the time of application submission.”
Don’t answer yet, there’s more!
Research Support: As part of the Biosketch section of the application, “Research Support” highlights your accomplishments, and those of your colleagues, as scientists. This information will be used by the reviewers in the assessment of each your qualifications for a specific role in the proposed project, as well as to evaluate the overall qualifications of the research team.
This is one of those areas where the NIH intent has been fought bitterly by the culture of peer review, in my experience (meaning in my ~two decades of being an applicant and slightly less time as a reviewer). These policy positions, instructions, etc and the segregation of the dollars and all total research funding into the Other Support documentation make it very clear to the naive reader that the NIH does not want reviewers contaminating their assessment of the merit of a proposal with their own ideas about whether the PI (or other investigators) have too much other funding. They do not want this at all. It is VERY clear and this new update to the Biosketch enhances this by deleting any obligatory spot where funding information seemingly has to go.
But they are paddling upstream in a rushing, spring flood, rapids Cat V river. Good luck, say I.
Whenever this has come up, I think I’ve usually reiterated the reasons why a person might be motivated to omit certain funding from their Biosketch. Perhaps you had an unfortunate period of funding that was simply not very productive for any of a thousand reasons. Perhaps you do have what looks to some eyes like “too much funding” for your age, tenure, institution type, sex or race. Or for your overall productivity level. Perhaps you have some funding that looks like it might overlap with the current proposal. Or maybe even funding from some source that some folks might find controversial. The NIH has always (i.e. during my time in the system) endorsed your ability to do so and the notion that these consideration should not influence the assessment of merit.
I have also, I hope consistently, warned folks not to ever, ever try to omit funding (within the past three years) from their Biosketch, particularly if it can be found in any way on the internet. This includes those foundation sites bragging about their awards, your own lab website and your institutional PR game which put out a brag on you. The reason is that reviewers just can’t help themselves. You know this. How many discussions have we had on science blogs and now science twitter that revolve around “solutions” to NIH funding stresses that boil down to “those guys over there have too much money and if we just limit them, all will be better”? Scores.
Believe me, all the attitudes and biases that come out in our little chats also are present in the heads of study section members. We have all sorts of ideas about who “deserves” funding. Sometimes these notions emerge during study section discussion or in the comments. Yeah, reviewers know they aren’t supposed to be judging this so it often come up obliquely. Amount of time committed to this project. Productivity, either in general or associated with specific other awards. Even ones that have nothing to do with the current proposal.
My most hilariously vicious personal attack summary statement critique ever was clearly motivated by the notion that I had “too much money”. One of the more disgusting aspects of what this person did was to assume incorrectly that I had a tap on resources associated with a Center in my department. Despite no indication anywhere that I had access to substantial funds from that source. A long time later I also grasped an even more hilarious part of this. The Center in question was basically a NIH funded Center with minimal other dollars involved. However, this Center has what appear to be peer Centers elsewhere that are different beasts entirely. These are Centers that have a huge non-federal warchest involving more local income and an endowment built over decades. With incomes that put R21 and even R01 money into individual laboratories that are involved in the Center. There was no evidence anywhere that I had these sorts of covert resources, and I did not. Yet this reviewer felt fully comfortable teeing off on my for “productivity” in a way that was tied to the assumption I had more resources than were represented by my NIH grants.
Note that I am not saying many other reviews of my grant applications have not been contaminated by notions that I have “too much”. At times I am certain they were. Based on my age at first. Based on my institution and job type, certainly. And on perceptions of my productivity, of course. And now in the post-Hoppe analysis….on my race? Who the fuck knows. Probably.
But the evidence is not usually clear.
What IS clear is that reviewers, who are your peers with the same attitudes they express around the water cooler, on average have strong notions about whether PIs “deserve” more funding based on the funding they currently have and have had in the past.
NIH is asking, yet again, for reviewers to please stop doing this. To please stop assessing merit in a way that is contaminated by other funding.
I look forward with fascination to see if NIH can managed to get this ship turned around with this latest gambit.
The very first evidence will be to monitor Biosketches in review to see if our peers are sticking with the old dictum of “for God’s sake don’t look like you are hiding anything” or if they will take the leap of faith that the new rules will be followed in spirit and nobody will go snooping around on RePORTER and Google to see if the PI has “too much funding”.
Thoughts on the NIH policy on SABV
March 3, 2021
There is a new review by Shansky and Murphy out this month which addresses the NIH policy on considering sex as a biological variable (SABV).
Shansky, RM and Murphy, AZ. Considering sex as a biological variable will require a global shift in science culture. Nat Neurosci, 2021 Mar 1. doi: 10.1038/s41593-021-00806-8. Online ahead of print.
To get this out of the way, score me as one who is generally on board with the sentiments behind SABV and one who started trying to change my own approach to my research when this first started being discussed. I even started trying to address this in my grant proposals several cycles before it became obligatory. I have now, as it happens, published papers involving both male and female subjects and continue to do so. We currently have experiments being conducted that involve both male and female subjects and my plan is to continue to do so. Also, I have had many exchanges with Dr. Shansky over the years about these issues and have learned much from her views and suggestions. This post is going to address where I object to things in this new review,for the most part, so I thought I should make these declarations, for what they are worth.
In Box 1, the review addresses a scientist who claims that s/he will first do the work in males and then followup in females as follows:
“We started this work in males, so it makes sense to keep going in males. We will follow up with females when this project is finished.” Be honest, when is a project ever truly finished? There is always another level of ‘mechanistic insight’ one can claim to need. Playing catch-up can be daunting, but it is better to do as much work in both sexes at the same
time, rather than a streamlined follow-up study in females years after the original male work was published. This latter approach risks framing the female work as a lower-impact ‘replication study’ instead of equally valuable to scientific knowledge.
This then dovetails with a comment in Box 2 about the proper way to conduct our research going forward:
At the bare minimum, adhering to SABV means using experimental cohorts that include both males and females in every experiment, without necessarily analyzing data by sex.
I still can’t get past this. I understand that this is the place that the NIH policy on SABV has landed. I do. We should run 50/50 cohorts for every study, as Shansky and Murphy are suggesting here. I cannot for the life of me see the logic in this. I can’t. In my work, behavioral work with rats for the most part, there is so much variability that I am loathe to even run half-size pilot studies. In a lot of the work that I do, N=8 is a pretty good starting size for the minimal ability to conclude much of anything. N=4? tough, especially as a starting size of the groups.
The piece eventually gets around to the notion of how we enforce the NIH SABV policy. As I have pointed out before and as is a central component of this review, we are moving rapidly into a time when the laboratories who claim NIH support for their studies are referencing grant proposals that were submitted under SABV rules.
NOT-OD-15-102 appeared in June of 2015 and warned that SABV policy would “will take effect for applications submitted for the January 25, 2016, due date, and thereafter“. Which means grants to be reviewed in summer 2016, considered at Council in the Fall rounds and potentially funded Dec 1, 2016. This means, with the usual problems with Dec 1 funding dates, that we are finishing up year 4 of some of these initial awards.
One of the main things that Shansky and Murphy address is in the section “Moving forward-who is responsible?“.
whether they have [addressed SABV] in their actual research remains to be seen. NIH grants are nonbinding, meaning that awardees are not required to conduct the exact experiments they propose. Moreover, there is no explicit language from NIH stating that SABV adherence will be enforced once the funds are awarded. Without accountability measures in place, no one is prevented from exclusively using male subjects in research funded under SABV policies.
Right? It is a central issue if we wish to budge the needle on considering sex as a biological variable. And the primary mechanism of enforcement is, well, us. The peers who are reviewing the subsequent grant applications from investigators who have been funded in the SABV era. The authors sortof mention this: “Researchers should be held accountable by making documentation of SABV compliance mandatory in yearly progress reports and by using compliance as a contingency for grant renewals (both noncompetitive and competitive).” Actually, the way this is structured, combined with the following sentence about manuscript review, almost sidesteps the critical issue. I will not sidestep in this post.
We, peer scientists who are reviewing the grant proposals, are the ones who must take primary responsibility to assess whether a PI and associated Investigators have made a good faith attempt to follow/adopt SABV policy or not. Leaving this in the hands of Program to sort out, based on tepid review comments, is a dereliction of duty and will result in frustrating variablity of review that we all hate. So….we are the ones who will either let PIs off the hook, thereby undercutting everything NIH has tried to accomplish, or we will assist NIH by awarding poor scores to applications with a team that has not demonstrably taken SABV seriously. We are at a critical and tenuous point. Will PIs believe that their grants will still be funded with a carefully crafted SABV statement, regardless of whether they have followed through? Or will PIs believe that their grant getting is in serious jeopardy if they do not take the spirit of the SABV policy to heart? The only way this is decided is if the peer review scores reward those who take it seriously and punish those who do not.
So now we are back to the main point of this post which is how we are to assess good-faith efforts. I absolutely agree with Shansky and Murphy that an application (competing or not) that basically says “we’re going to follow up in the females later“, where later means “Oh we didn’t do it yet, but we pinky swear we will do it in this next interval of funding” should not be let off the hook.
However. What about a strategy that falls short of the “bare minimum”, as the authors insist on in Box 2, of including males and females in 50/50 proportion in every experiment, not powered to really confirm any potential sex difference?
I believe we need a little more flexibility in our consideration of whether the research of the PI is making a good faith effort or not. What I would like to see is simply that male and female studies are conducted within the same general research program. Sure, it can be the 50/50 group design. But it can also be that sometimes experiments are in males, sometimes in females. Particularly if there is no particular sense that one sex is always run first and the other is trivially “checked, or that one sex dominates the experimental rationale. Pubs might include both sexes within one paper, that’s the easiest call, but they might also appear as two separate publications. I think this can often be the right approach, personally.
This will require additional advocacy, thinking, pushback, etc, on one of the fundamental principles that many investigators have struggled with in the SABV era. As is detailed in Box 1 and 2 of the review, SABV does not mean that each study is a direct study of sex differences nor that every study in female mammals becomes a study of estrous cycle / ovarian hormones. My experience, as both an applicant and a reviewer, is that NIH study section members often have trouble with this notion. There has not been, in my experience on panels, a loud and general chorus rebutting any such notions during discussion either, we have much ground still to cover.
So we will definitely have to achieve greater agreement on what represents a good faith effort on SABV, I would argue, if we are to advocate strongly for NIH study sections to police SABV with the firm hand that it will require.
I object to what might be an obvious take-away from Shansky and Murphy, i.e., that the 50/50 sample approach is the essential minimum. I believe that other strategies and approaches to SABV can be done which both involve full single-sex sample sizes and do not require every study to be a direct contrast of the sexes in an experimentally clean manner.
On targeting NIH funding opportunities to URMs: The Lauer edition
January 28, 2021
I have long standing doubts about certain aspects of funding mechanisms that are targeted to underrepresented individuals. This almost always has come up in the past in the context of graduate or postdoctoral fellowships and when there is a FOA open to all, and a related or parallel FOA that is directed explicitly at underrepresented individuals. For example see NINDS F31, K99/R00 , NIGMS K99/R00 initiatives, and there is actually a NIH parent F32 – diversity as well).
At first blush, this looks awesome! Targeted opportunity, presumably grant panel review that gives some minimal attention to the merits of the FOA and, again presumably, some Program traction to fund at least a few.
My Grinchy old heart is, however, suspicious about the real opportunity here. Perhaps more importantly, I am concerned about the real opportunity versus the opportunity that might be provided by eliminating any disparity of review that exists for the review of applications that come in via the un-targeted FOA. No matter the FOA, the review of NIH grants is competitive and zero sum. Sure, pools of money can be shifted from one program to another (say from the regular F31 to the F31-diversity) but it is rarely the case there is any new money coming in. Arguing about the degree to which funding is targeted by decision of Congress, of the NIH Director, of IC Directors or any associated Advisory Councils is a distraction. Sure NIGMS gets a PR hit from announcing and funding some MOSAIC K99/R00 awards…but they could just use those moneys to fund the apps coming in through their existing call that happen to have PIs who are underrepresented in science.
The extreme example here is the highly competitive K99 application from a URM postdoc. If it goes in to the regular competition, it is so good that it wins an award and displaces, statistically, a less-meritorious one that happens to have a white PI. If it goes in to the MOSAIC competition, it also gets selected, but in this case by displacing a less-meritorious one that happens to have a URM PI. Guaranteed.
These special FOA have the tendency to put all the URM in competition with each other. This is true whether they would be competitive against the biased review of the regular FOA or, more subtly, whether they would be competitive for funding in a regular FOA review that had been made bias-free(r).
I was listening to a presentation from Professor Nick Gilpin today on his thoughts on the whole Ginther/Hoppe situation (see his Feature at eLife with Mike Taffe) and was struck by comments on the Lauer pre-print. Mike Lauer, head of NIH’s office of extramural awards, blogged and pre-printed an analysis of how the success rates at various NIH ICs may influence the funding rate for AA/B PIs. It will not surprise you that this was yet another attempt to suggest it was AA/B PIs’ fault that they suffer a funding disparity. For the sample of grants reviewed by Lauer (from the Hoppe sample), 2% were submitted with AA/B PIs, NIH-wide. The percentage submitted to the 19 individual funding ICs he covered ranged from 0.73% to 14.7%. This latter institute was the National Institute on Minority Health and Health Disparities (NIMHD). Other notable ICs of disproportionate relevance to the grants submitted with AA/B PIs include NINR (4.6% AA/B applications) and NICHD (3%).
So what struck me, as I listened to Nick’s take on these data, is that this is the IC assignment version of the targeted FOA. It puts applications with AA/B investigators in higher competition with each other. “Yeahbutt”, you say. It is not comparable. Because there is no open competition version of the IC assignment.
Oh no? Of course there is, particularly when it comes to NIMHD. Because these grants will very often look like a grant right down the center of those of interest to the larger, topic-focused ICs….save that it is relevant to a population considered to be minority or suffering a health disparity. Seriously, go to RePORTER and look at new NIMHD R01s. Or heck, NIMHD is small enough you can look at the out year NIMHD R01s without breaking your brain since NIHMH only gets about 0.8% of the NIH budget allocation. With a judicious eye to topics, some related searches across ICs, and some clicking on the PI names to see what else they may have as funded grants, you can quickly convince yourself that plenty of NIMHD awards could easily be funded by a related I or C with their much larger budgets*. Perhaps the contrary is also true, grants funded by the parent / topic IC which you might also argue would fit at NIMHD, but I bet the relative percentage goes the first way.
If I am right in my suspicions, the existence of NIMHD does not necessarily put more aggregate money into health disparities research. That is, more than that which could just as easily come out of the “regular” budget. The existence of NIMHD means that the parent IC can shrug off their responsibility for minority health issues or disparity issues within their primary domains of drug abuse, cancer, mental health, alcoholism or what have you. Which means they are likewise shrugging off the AA/B investigators who are disproportionately submitting applications with those NIMHD-relevant topics and being put in sharp competition with each other. Competition not just within a health domain, but across all health domains covered by the NIH.
It just seems to me that putting the applications with Black PIs preferentially in competition with themselves, as opposed to making it a fair competition for the entire pool of money allocated to the purpose, is sub optimal.
__
*Check out the descriptions for MD010362 and CA224537 for some idea of what I mean. The entire NIMHD budget is 5% as large as the NCI budget. Why, you might ask, is NCI not picking up this one as well?
NIH grant application topics by IC
August 13, 2020
As you will recall, the Hoppe et al. 2019 report [blogpost] both replicated Ginther et al 2011 with a subsequent slice of grant applications, demonstrating that after the news of Ginther, with a change in scoring procedures and changes in permissible revisions, applications with Black PIs still suffered a huge funding disparity. Applications with white PIs are 1.7 times more likely to be funded. Hoppe et al also identified a new culprit for the funding disparity to applications with African-American / Black PIs. TOPIC! “Aha”, they crowed, “it isn’t that applications with Black PIs are discriminated against on that basis, no. It’s that the applications with Black PIs just so happen to be disproportionately focused on topics that just so happen to have lower funding / success rates”. Of course it also was admitted very quietly by Hoppe et al that:
WH applicants also experienced lower award rates in these clusters, but the disparate outcomes between AA/B and WH applicants remained, regardless of whether the topic was among the higher- or lower-success clusters (fig. S6).
Hoppe et al., Science Advances, 2019 Oct 9;5(10):eaaw7238. doi: 10.1126/sciadv.aaw7238
If you go to the Supplement Figure S6 you can see that for each of the five quintiles of topic clusters (ranked by award rates) applications with Black PIs fare worse than applications with white PIs. In fact, in the least-awarded quintile, which has the highest proportion of the applications with Black PIs, the white PI apps enjoy a 1.87 fold advantage, higher than the overall mean of the 1.65 fold advantage.
Record scratch: As usual I find something new every time I go back to one of these reports on the NIH funding disparity. The overall award rate disparity was 10.7% for applications with Black PIs versus 17.7% for those with white PIs. The take away from Hoppe et al. 2019 is reflected in the left side of Figure S6 where it shows that the percentage of applications with Black PIs is lowest (<10%) in the topic domains with the highest award rates and highest (~28%) in the domains with the lowest award rates. The percentages are more similar for apps with white PIs, approximately 20% per quintile. But the right side lists the award rates by quintile. And here we see that in the second highest award-rate topic quintile, the disparity is similar to the mean (12.6% vs 18.9%) but in the top quintile it is greater (13.4% vs 24.2% or a 10.8%age point gap vs the 7%age point gap overall). So if Black PIs followed Director Collins’ suggestion that they work on the right topics with the right methodologies, they would fare even worse due to the 1.81 fold advantage for applications with white PIs in the top most-awarded topic quintile!
Okay but what I really started out to discuss today was a new tiny tidbit provided by a blog post on the Open Mike blog. It reports the topic clusters by IC. This is cool to see since the word clusters presented in Hoppe (Figure 4) don’t map cleanly onto any sort of IC assumptions.

All we are really concerned with here is the ranking along the X axis. From the blog post:
…17 topics (out of 148), representing 40,307 R01 applications, accounted for 50% of the submissions from African American and Black (AAB) PIs. We refer to these topics as “AAB disproportionate” as these are topics to which AAB PIs disproportionately apply.
Note the extreme outliers. One (MD) is the National Institute on Minority Health and Health Disparities. I mean… seriously. The other (NR) is the National Institute on Nursing Research which is also really interesting. Did I mention that these two Is get 0.8% and 0.4% of the NIH budget, respectively? The NIH mission statement reads: “NIH’s mission is to seek fundamental knowledge about the nature and behavior of living systems and the application of that knowledge to enhance health, lengthen life, and reduce illness and disability.” Emphasis added. The next one (TW) is the Fogerty International Center which focuses on global health issues (hello global pandemics!) and gets 0.2% of the NIH budget.
Then we get into the real meat. At numbers 4-6 on the AAB Disproportionate list of ICs we reach the National Institute on Child Health and Development (HD, 3.7% of the budget), NIDA (DA, 3.5%) and NIAAA (AA, 1.3%). And clocking in at 7 and 9 we have National Institute on Aging (AG, 8.5%) and the NIMH (MH, 4.9%).
These are a lot of NIH dollars being expended in ICs of central interest to me and a lot of my audience. We could have made some guesses based on the word clusters in Hoppe et al 2019 but this gets us closer.
Yes, we now need to get deeper and more specific. What is the award disparity for applications with Black vs white PIs within each of these ICs? How much of that disparity, if it exists, accounted for by the topic choices within IC?
And lets consider the upside. If, by some miracle, a given IC is doing particularly well with respect to funding applications with Black PIs fairly….how are they accomplishing this variance from the NIH average? What can the NIH adopt from such an IC to improve things?
Oh, and NINR and NIMHHD really need a boost to their budgets. Maybe NIH Director Collins could put a 10% cut prior to award to the other ICs to improve investment in the applying-knowledge-to-enhance-health goals of the mission statement?
Sally Amero, Ph.D., NIH’s Review Policy Officer
and Extramural Research Integrity Liaison Officer, has posted a new entry on the Open Mike blog addressing reviewer guidance in the Time of Corona. They have listed a number of things that are now supposed not to affect scoring. The list includes:
- Some key personnel on grant applications may be called up to serve in patient testing or patient care roles, diverting effort from the proposed research
- Feasibility of the proposed approach may be affected, for example if direct patient contact is required
- The environment may not be functional or accessible
- Additional human subjects protections may be in order, for example if the application was submitted prior to the viral outbreak
- Animal welfare may be affected, if institutions are closed temporarily
- Biohazards may include insufficient protections for research personnel
- Recruitment plans and inclusion plans may be delayed, if certain patient populations are affected by the viral outbreak
- Travel for key personnel or trainees to attend scientific conferences, meetings of consortium leadership, etc., may be postponed temporarily
- Curricula proposed in training grant applications may have to be converted to online formats temporarily
- Conferences proposed in R13/U13 applications may be cancelled or postponed.
Honestly, I’m not seeing how we are in a situation where this comes into the consideration. Nothing moves quickly enough with respect to grant proposals for future work. I mean, any applicants should be optimistic and act like everything will be normal status, for grants submitted this round for first possible funding, ah, NEXT APRIL. Grants received for review in the upcoming June/July study sections were for the most part received before this shutdown happened so likewise, there is no reason they would have had call to mention the Corona Crisis. That part is totally perplexing.
The next bit, however is a real punch in the gut.
We have also had many questions from applicants asking what they should do if they don’t have enough preliminary data for the application they had planned to submit. While it may not be the most popular answer, we always recommend that applicants submit the best application possible. If preliminary data is lacking, consider waiting to submit a stronger application for a later due date.
Aka “Screw you”.
I will admit this was entirely predictable.
There is no guarantee that grant review in the coming rounds will take Corona-related excuses seriously. And even if they do, this is still competition. A competition where if you’ve happened to be more productive than the next person, your chances are better. Are the preliminary data supportive? Is your productivity coming along? Well, the next PI looks fine and you look bad so…. so sorry, ND. Nobody can ever have confidence that where they are when they shut down for corona will ever be enough to get them their next bit of funding.
I don’t see any way for the NIH to navigate this. Sure, they could give out supplements to existing grants. But, that only benefits the currently funded. Bridge awards for those that had near-miss scores? Sure, but how many can they afford? What impact would this have on new grants? After all, the NIH shows no signs yet of shutting down receipt and review or of funding per Council round as normal. But if we are relying on this, then we are under huge pressure to keep submitting grants as normal. Which would be helped by new Preliminary Data. And more publications.
So we PIs are hugely, hugely still motivated to work as normal. To seek any excuse as to why our ongoing studies are absolutely essential. To keep valuable stuff going, by hook or by crook…. Among other reasons, WE DON’T KNOW THE END DATE!
I hate being right when it comes to my cynical views on how the NIH behaves. But it is very clear. They are positively encouraging the scofflaws to keep on working, to keep pressing their people to come to work and to tell their administration whatever is necessary to keep it rolling. The NIH is positively underlining the word Essential for our employees. If you don’t keep generating data, the lab’s chances of getting funded go down, relative to the labs that keep on working. Same thing for fellowships, trainees. That other person gunning for the rare K99 in your cohort is working so…..
Here’s the weird thing. These people at the NIH have to know that their exhortations to reviewers to do this, that or the other generally do not work. Look how the early stage / young investigator thing has played out across four or five decades. Look at the whole SABV initiative. Look at the remarks we’ve seen where grant reviewers refuse to accept that pre-prints are meaningful.
All they would have had to do is put in some meaningless pablum about how they were going to “remind reviewers that they should assume issues resulting from the coronavirus pandemic should not affect scores” and include Preliminary Data may not be as strong as in other times in the above bullet point list.
NIH/CSR reverts to random-order grant review
September 13, 2019
A tweet from Potnia Theron alerts us to a change in the way the CSR study sections at NIH will review grants for this round and into the future. A tweet from our good blog friend @boehninglab confirms a similar story from the June rounds. I am very pleased.
When I first started reviewing NIH grants, the grants were ordered for discussion in what appeared to be clusters of grants assigned to the same Program Officer or at least a given Program Branch or Division. This was back when a substantial number of the POs would attend the meeting in person to follow the discussion of the grants which might be of interest to them to fund. Quite obviously, it would be most efficient and reasonable for a SRO to be able to tell a PO that their grants would all be discussed in, e.g., a two hour contiguous and limited time interval instead of scattered randomly across a two day meeting interval.
Importantly, this meant that grants were not reviewed in any particular order with respect to pre-meeting scores and the grants slated for triage were ordered along with everything that was slated to be discussed.
When we shifted to reviewing grants in ascending order of preliminary score (i.e., best to worst) I noticed some things that were entirely predictable*. These things had a quelling effect on score movement through the discussion process for various reasons. Now I do say “noticed“. I have not seen any data from the CSR on this and would be very interested to see some before / after for the prior change and for the current reversion. So I cannot assert any strong position that indeed my perceptions are valid.
This had the tendency to harden the very best scores. Which, btw, were the ones almost guaranteed to fund since this came along during a time of fixed budget and plummeting paylines. Still, the initial few projects were not as subject to…calibration…as they may have been before. When you are facing the first two proposals in the round, it’s easy for everyone to nod along with the reviewers who are throwing 2s and saying the grant is essentially perfect. When you get such a beast in day 2 when you’ve already battled through a range of issues…..it’s more likely someone is going to say “yeah but whattabout….?”
It’s axiomatic that there is no such thing as an unassailable “perfect” grant proposal. Great scores arise not because the reviewers can find no flaws but because they have chosen to overlook or downplay flaws that might have been a critical point of discussion for another proposal. The way the NIH review works, there is no re-visitation of prior discussions just because someone realizes that the issue being used to beat the heck out of the current application also applied to the one discussed five grants ago that was entirely downplayed or ignored. This is why, fair or not, discussion tends to get more critical as the meeting goes on. So in the old semi-random order, apps that had good and bad preliminary scores were equally subject to this factor. In the score-ordered era, the apps with the best preliminary scores were spared this effect.
Another factor which contributed to this hardening of the preliminary score order is the “why bother?” factor. Reviewers are, after all, applicants and they are sensitive to the perceived funding line as it pertains to the scores. They have some notion of whether the range of scores under current discussion means “this thing is going to fund unless the world explodes“, “this thing is going to be a strong maybe and is in the hunt for gray zone pickup” or “no way, no how is this going to fund unless there is some special back scratching going on“. And believe you me they score accordingly despite constant admonishment to use the entire range and that reviewers do not make funding decisionsTM.
When I was first on study section the SRO sent out scoring distribution data for the prior several rounds and it was awesome to see. The score distribution would flatten out (aka cluster) right around the operative perceived score line at the time. The discussions would be particularly fierce around that line. But since an app at any given score range could appear throughout the meeting there was motivation to stay on target, right through to the last app discussed at times. With the ordered review, pretty much nothing was going to matter after lunch on the first day. Reviewers were not making distinctions that would be categorically relevant after that point. Why bother fighting over precisely which variety of unfundable score this app receives? So I argue that exhaustion was more likely to amplify score hardening.
I don’t have any data for that but I bet the CSR does if they would care to look.
These two factors hit the triage list in a double whammy.
To recap, anyone on the panel (and not in conflict) can request that a grant slated not to be discussed be raised for discussion. For any reason.
In the older way of doing things, the review order would include grants scheduled for triage, the Chair would come to it and just say that it was triaged and ask if anyone wanted to discuss it. Mostly everyone just enters ND on the form and goes on to the next one. However sometimes a person wanted to bring it up out of triage and discuss it.
You can see that if this was in order of the third proposal on the first day that the psychology of pulling it up would differ from if it were an application scheduled last in the meeting on day 2 when everyone is eager to rush to the airport.
In the score order way of doing things, this all came at the end. When the mind of the reviewer was already on early flights and had sat through many hours of “why are we discussing this one when it can’t possibly fund”. The pressure not to pull up any more grants for discussion was severe. My perception is that the odds of being pulled up for discussion went way, way, way down. I bet CSR has data on that. I’d like to see it.
I don’t have full details if the new policy of review order will include triaged apps or be a sort of hybrid. But I hope it returns to scheduling the triaged apps right along with everything else so that they have a fairer chance to be pulled up for discussion.
__
*and perhaps even intentional. There were signs from Scarpa, the CSR Director at the time, that he was trying to reduce the number of in-person meetings (which are very expensive). If scores did not change much (and if the grants selected for funding did not change) between the pre-meeting average of three people and the eventual voted score, then meetings were not a good thing to have. Right? So a suspicious person like myself immediately suspected that the entire goal of reviewing grants in order of initial priority score was to “prove” that meetings added little value by taking structural steps to reduce score movement.
The current version of the NIH Biosketch includes a space for a Personal Statement. As the Instructions say, this is to
Briefly describe why you are well-suited for your role(s) in this project. Relevant factors may include: aspects of your training; your previous experimental work on this specific topic or related topics; your technical expertise; your collaborators or scientific environment; and/or your past performance in this or related fields.
This part is pretty obvious. As you are aware, the Investigator criterion is one of five allegedly co-equal criteria on which the merit of your NIH application is supposed to be assessed. But this could also be approximately deduced from the old version of the Biosketch, all this does is enhance your ability to spin a tale for easy apprehension. But the new Personal Statement of the biosketch allows something that wasn’t allowed before.
Note the following additional instructions for ALL applicants/candidates:
If you wish to explain factors that affected your past productivity, such as family care responsibilities, illness, disability, or military service, you may address them in this “A. Personal Statement” section.
This was a significant advance, in my view. For better or for worse, one of the key facts about you as an investigator that is of interest to reviewers of your application is your scientific productivity. The thinking goes that if you have been a productive investigator in the past then you will be a productive investigator in the future and are therefore, as they say, a strength of the proposal. Conversely, if you have not produced very well or have suspicious gaps in your productivity this is a weakness- perhaps it predicts that you are not assured to be productive in the future.
Now, my view is that gaps in productivity or periods of unexpectedly low productivity are not a death knell. At least when I have been in the room for discussion of grants, I find that reviewers have a nonzero probability of giving good scores despite some evidence of poor productivity of the PI. The key is that they need to have a reason for why the productivity was low. In ye olden dayes, the applicant had to just suffer the bad score on the first version of the application and then supply his or her explanation in the Intro to the revised (amended; A1) application. So it is an advantage to be able to pre-empt this whole cycle and provide a reason for the appearance of a slow period in the PI’s history.
It is not, of course, some sort of trump or get out of jail free card. Reviewers are still free to view your productivity however they like, fairly or not. They are free to view the explanation that you offer however they like as well. But the advantage is that they can evaluate the explanation. And the favorably disposed reviewer can use that information to argue against the criticisms of the disfavorable reviewer. It gives the applicant a chance, where before there was none.
You will notice that I use the term explanation and not the term excuse. It is not an excuse. This is not a good way to view it. Not good on the part of the applicant or on the part of the reviewer(s). Grant evaluation is not a reward or a punishment for past behavior. Grant evaluation is a prediction about the future, given that the grant is funded. When it comes to PI productivity, past performance is only properly used to try to predict (imperfectly) future performance. If the PI got in a bad car wreck and was in intensive care for two months and basically invalided for another nine months, well, this says something about the prediction validity of that corresponding gap in publications. Right? And you’d have to be a real jerk to think that this PI deserved to be somehow punished (with a bad grant score) for getting in a car wreck.
This was triggered by a tweet that seemed to be saying that life is hard for everyone, why should we buy anyone’s excuse. I thought the tone was a bit punitive. And that it might scare people out of using the Personal Statement as it was intended to be used by applicants and how, in my view, it should be used by reviewers. As I said above, there is no formal obligation for reviewers to “buy” an explanation that is proffered. And my personal view on what represents a jerky reviewer stance on a given explanation for a gap in productivity cannot possibly extend to all situations. But I do think that all reviewers should probably understand that there is a very explicit reason why the NIH allows this content in the Personal Statement. And should not view someone taking advantage of that as some sort of demerit in and of itself.
Preprints and NIH Study Section Behavior
March 11, 2019
An interesting pre-print discussion emerged on Twitter today in the wake of an observation
that members of study sections apparently are not up to speed on the NIH policy encouraging the use of pre-prints and permitting them to be cited in NIH grant applications. The relevant Notice [NOT-OD-17-050] was issued in March of 2017 and it is long past time for most reviewers to be aware of what pre-prints are, what they are not and to understand that NIH has issued the above referenced Notice.
Now, the ensuring Twitscussion diverted off into several related topics but the part I find worth addressing is a tone that suggests that not only should NIH grant reviewers understand what a pre-print it, but that they should view them in some particular way. Typically this is expressed as outrage that reviewers do not view pre-prints favorably and essentially just like a published paper. On this I do not agree and will push a different agenda. NIH reviewers were not told how to view pre-prints in the context of grant review by the NIH as far as I know. Or, to the extent the NIH issued instructions, it was to essentially put pre-prints down below peer reviewed work.
The NIH agrees that interim research products offer lower quality information than peer-reviewed products. This policy is not intended to replace peer-review, nor peer-reviewed journals …
Further, the NIH is instructing awardees to explicitly state in preprints text that the work is not peer-reviewed. These two practices should help reviewers easily identify interim products. The NIH will offer explicit guidance to reviewers reminding them that interim research products are not peer-reviewed. Further, since interim products are new to so many biomedical disciplines, the NIH hopes that these conventions will become the norm for all interim products, and will help the media and the public understand that interim products have undergone less review than peer-reviewed articles.
https://grants.nih.gov/grants/guide/notice-files/NOT-OD-17-050.html
Given this, I would suggest that NIH reviewers are quite free to discount pre-prints entirely, to view them as preliminary data (and be grumpy about this as an effort to evade the page limits of the NIH application)…..or to treat them as fully equivalent to a peer reviewed paper because they disagree with the NIH’s tone / take on this. Reviewers get to decided. And as is typical, if reviewers on the same panel disagree they are free to hammer this disagreement out during the Discussion of applications.
I believe that pre-print fans should understand that they have to advocate and discuss their views on pre-prints and also understand that merely whinge about how reviewers must be violating the integrity of review or some such if they do not agree with the most fervent pre-print fans is not helpful. We advocate first by using pre-prints with regularity ourselves. We advocate next by taking advantage of the NIH policy and citing our pre-prints in our grant applications, identified as such. Then, if we happen to be invited to serve on study sections we can access a more direct lever, the Discussion of proposals. (Actually, just writing something in the critique about how it is admirable would be helpful as well. NIH seems to suggest in their Notice that perhaps this would go under Rigor.)
A bit in Science authored by Jocelyn Kaiser recently covered the preprint posted by Forscher and colleagues which describes a study of bias NIH grant review. I was struck by a response Kaiser obtained from one of the authors on the question of range restriction.
Some have also questioned Devine’s decision to use only funded proposals, saying it fails to explore whether reviewers might show bias when judging lower quality proposals. But she and Forscher point out that half of the 48 proposals were initial submissions that were relatively weak in quality and only received funding after revisions, including four that were of too low quality to be scored.
They really don’t seem to understand NIH grant review where about half of all proposals are “too low quality to be scored”. Their inclusion of only 8% ND applications simply doesn’t cut it. Thinking about this, however, motivated me to go back to the preprint, follow some links to associated data and download the excel file with the original grant scores listed.
I do still think they are missing a key point about restriction of range. It isn’t, much as they would like to think, only about the score. The score on a given round is a value with considerable error, as the group itself described in a prior publication in which the same grant reviewed in different ersatz study sections ended up with a different score. If there is a central tendency for true grant score, which we might approach with dozens of reviews of the same application, then sometimes any given score is going to be too good, and sometimes too bad, as an estimate of the central tendency. Which means that on a second review, the score for the former are going to tend to get worse and the scores for the latter are going to tend to get better. The authors only selected the ones that tended to get better for inclusion (i.e., the ones that reached funding on revision).
Anther way of getting at this is to imagine two grants which get the same score in a given review round. One is kinda meh, with mostly reasonable approaches and methods from a pretty good PI with a decent reputation. The other grant is really exciting, but with some ill considered methodological flaws and a missing bit of preliminary data. Each one comes back in revision with the former merely shined up a bit and the latter with awesome new preliminary data and the methods fixed. The meh one goes backward (enraging the PI who “did everything the panel requested”) and the exciting one is now in the fundable range.
The authors have made the mistake of thinking that grants that are discussed, but get the same score well outside the range of funding, are the same in terms of true quality. I would argue that the fact that the “low quality” ones they used were revisable into the fundable range makes them different from the similar scoring applications that did not eventually win funding.
In thinking about this, I came to realize another key bit of positive control data that the authors could provide to enhance our confidence in their study. I scanned through the preprint again and was unable to find any mention of them comparing the original scores of the proposals with the values that came out of their study. Was there a tight correlation? Was it equivalently tight across all of their PI name manipulations? To what extent did the new scores confirm the original funded, low quality and ND outcomes?
This would be key to at least partially counter my points about the range of applications that were included in this study. If the test reviewer subjects found the best original scored grants to be top quality, and the worst to be the worst, independent of PI name then this might help to reassure us that the true quality range within the discussed half was reasonably represented. If, however, the test subjects often reviewed the original top grants lower and the lower grants higher, this would reinforce my contention that the range of the central tendencies for the quality of the grant applications was narrow.
So how about it, Forscher et al? How about showing us the scores from your experiment for each application by PI designation along with the original scores?
__
Patrick Forscher William Cox Markus Brauer Patricia Devine, No race or gender bias in a randomized experiment of NIH R01 grant reviews. Created on: May 25, 2018 | Last edited: May 25, 2018; posted on PsyArXiv
Should NIH provide a transcript of the discussion of grants?
February 16, 2018
Respected neuroscientist Bita Moghaddam seems to think this would be a good idea.
She then goes on to mention the fact that POs listen in on grant discussion, can take notes and can give the PI a better summary of the discussion that emerges in the Resume of Discussion written by the SRO.
This variability in PO behavior then leads to some variability in the information communicated to the PI. I’ve had one experience where a PO gave me such chapter and verse on the discussion that it might have been slightly over the line (pre- and post-discussion scores). Maybe two other ones where the PO gave me a very substantial run down. But for the most part POs have not been all that helpful- either they didn’t attend or they didn’t pay attention that closely or they just didn’t care to tell me anything past the “we suggest you revise and resubmit” mantra. She has a good point that it is not ideal that there is so much variability. When I’ve touched on this issue in the past, I’ve suggested this is a reason to cultivate as many POs as possible in your grant writing so that you have a chance of getting the “good” ones now and again. Would providing the transcript of discussion help? Maybe?
Or maybe we could just start lobbying the ICs of our fondest acquaintance to take the effort to make the POs behave more consistently.
But I have two problems with Professor Moghaddam’s proposals. First of course, is the quashing effect that de-anonymizing (and while a transcript could still be anonymized it is in the same vein of making reviewers hesitate to speak up) may have on honest and open comment. The second problem is that it goes into reinforcing the idea that properly revising a grant application is merely “doing what they said to do”. Which then should, the thinking goes, make the grant fundable next time.
This is, as you know, not the way the system is set to work and is a gut-feeling behavior of reviewers that the CSR works hard to counter. I don’t know if having the transcript would help or hurt in this regard. I guess it would depend on the mindset of the PI when reading the transcript. If they were looking to merely suss out* the relative ratio of seriousness of various critiques perhaps this would be fine?
__
*My fear is that this would just feed the people who are looking to litigate their review to “prove” that they got screwed and deserve funding.
Rigor, reproducibility and the good kid
February 9, 2018
I was the good kid.
In my nuclear family, in school and in pre-adult employment.
At one point my spouse was in a very large lab and observed how annoying it is when the PI reads everyone the riot act about the sins of a few lab-jerks.
Good citizens find it weird and off-putting when they feel criticized for the sins of others.
They find it super annoying that their own existing good behavior is not recognized.
And they are enraged when the jerko is celebrated for finally, at last managing to act right for once.
Many of us research scientists feel this way when the NIH explains what they mean by their new initiative to enhance “rigor and reproducibility”.
____
“What? I already do that, so does my entire subfield. Wait…..who doesn’t do that?” – average good-kid scientist response to hearing the specifics of the R&R initiative.
SABV in NIH Grant Review
February 8, 2018
We’re several rounds of grant submission/review past the NIH’s demand that applications consider Sex As a Biological Variable (SABV). I have reviewed grants from the first round of this obligation until just recently and have observed a few things coming into focus. There’s still a lot of wiggle and uncertainty but I am seeing a few things emerge in my domains of grants that include vertebrate animals (mostly rodent models).
1) It is unwise to ignore SABV.
2) Inclusion of both sexes has to be done judiciously. If you put a sex comparison in the Aim or too prominently as a point of hypothesis testing you are going to get the full blast of sex-comparisons review. Which you want to avoid because you will get killed on the usual- power, estrus effects that “must” be there, various caveats about why male and female rats aren’t the same – behaviorally, pharmacokinetically, etc etc – regardless of what your preliminary data show.
3) The key is to include both sexes and say you will look at the data to see if there appears to be any difference. Then say the full examination will be a future direction or slightly modify the subsequent experiments.
4) Nobody seems to be fully embracing the SABV concept coming from the formal pronouncements about how you use sample sizes that are half males and half females into perpetuity if you don’t see a difference. I am not surprised. This is the hardest thing for me to accept personally and I know for certain sure manuscript reviewers won’t go for it either.
Then there comes the biggest categorical split in approach that I have noticed so far.
5a) Some people appear to use a few targeted female-including (yes, the vast majority still propose males as default and females as the SABV-satisfying extra) experiments to check main findings.
5b) The other take is just to basically double everything up and say “we’ll run full groups of males and females”. This is where it gets entertaining.
I have been talking about the fact that the R01 doesn’t pay for itself for some time now.
A full modular, $250K per year NIH grant doesn’t actually pay for itself.
the $250K full modular grant does not pay for itself. In the sense that there is a certain expectation of productivity, progress, etc on the part of study sections and Program that requires more contribution than can be afforded (especially when you put it in terms of 40 hr work weeks) within the budget.
The R01 still doesn’t pay for itself and reviewers are getting worse
I have reviewed multiple proposals recently that cannot be done. Literally. They cannot be accomplished for the price of the budget proposed. Nobody blinks an eye about this. They might talk about “feasibility” in the sense of scientific outcomes or preliminary data or, occasionally, some perceived deficit of the investigators/environment. But I have not heard a reviewer say “nice but there is no way this can be accomplished for $250K direct”.
Well, “we’re going to duplicate everything in females” as a response to the SABV initiative just administered the equivalent of HGH to this trend. There is approximately zero real world dealing with this in the majority of grants that slap in the females and from what I have seen no comment whatever from reviewers on feasibility. We are just entirely ignoring this.
What I am really looking forward to is the review of grants in about 3 years time. At that point we are going to start seeing competing continuation applications where the original promised to address SABV. In a more general sense, any app from a PI who has been funded in the post-SABV-requirement interval will also face a simple question.
Has the PI addressed SABV in his or her work? Have they taken it seriously, conducted the studies (prelim data?) and hopefully published some things (yes, even negative sex-comparisons)?
If not, we should, as reviewers, drop the hammer. No more vague hand wavy stuff like I am seeing in proposals now. The PI had better show some evidence of having tried.
What I predict, however, is more excuse making and more bad faith claims to look at females in the next funding interval.
Please prove me wrong, scientists in my fields of study.
__
Additional Reading:
NIH’s OER blog Open Mike on the SABV policies.
NIH Reviewer Guidance [PDF]
Undue influence of frequent NIH grant reviewers
February 7, 2018
A quotation
Currently 20% of researchers perform 75-90% of reviews, which is an unreasonable and unsustainable burden.
referencing this paper on peer review appeared in a blog post by Gary McDowell. It caught my eye when referenced on the twitts.
The stat is referencing manuscript / journal peer review and not the NIH grant review system but I started thinking about NIH grant review anyway. Part of this is because I recently had to re-explain one of my key beliefs about a major limitation of the NIH grant review system to someone who should know better.
NIH Grant review is an inherently conservative process.
The reason is that the vast majority of reviews of the merit of grant applications are provided by individuals who already have been chosen to serve as Principal Investigators of one or more NIH grant awards. They have had grant proposals selected as meritorious by the prior bunch of reviewers and are now are contributing strongly to the decision about the next set of proposals that will be funded.
The system is biased to select for grant applications written in a way that looks promising to people who have either been selected for writing grants in the same old way or who have been beaten into writing grants that look the same old way.
Like tends to beget like in this system. What is seen as meritorious today is likely to be very similar to what has been viewed as meritorious in the past.
This is further amplified by the social dynamics of a person who is newly asked to review grants. Most of us are very sensitive to being inexperienced, very sensitive to wanting to do a good job and feel almost entirely at sea about the process when first asked to review NIH grants. Even if we have managed to stack up 5 or 10 reviews of our proposals from that exact same study section prior to being asked to serve. This means that new reviewers are shaped even more by the culture, expectations and processes of the existing panel, which is staffed with many experienced reviewers.
So what about those experienced reviewers? And what about the number of grant applications that they review during their assigned term of 4 (3 cycles per year, please) or 6 (2 of 3 cycles per year) years of service? With about 6-10 applications to review per round this could easily be highly influential (read: one of the three primary assigned reviewers) review of 100 applications. The person has additional general influence in the panel as well, both through direct input on grants under discussion and on the general tenor and tone of the panel.
When I was placed on a study section panel for a term of service I thought the SRO told us that empaneled reviewers were not supposed to be asked for extra review duties on SEPs or as ad hoc on other panels by the rest of the SRO pool. My colleagues over the years have disabused me of the idea that this was anything more than aspirational talk from this SRO. So many empaneled reviewers are also contributing to review beyond their home review panel.
My question of the day is whether this is a good idea and whether there are ethical implications for those of us who are asked* to review NIH grants.
We all think we are great evaluators of science proposals, of course. We know best. So of course it is all right, fair and good when we choose to accept a request to review. We are virtuously helping out the system!
At what point are we contributing unduly to the inherent conservativeness of the system? We all have biases. Some are about irrelevant characteristics like the ethnicity** of the PI. Some are considered more acceptable and are about our preferences for certain areas of research, models, approaches, styles, etc. Regardless these biases are influencing our review. Our review. And one of the best ways to counter bias is the competition of competing biases. I.e., let someone else’s bias into the mix for a change, eh buddy?
I don’t have a real position on this yet. After my term of empaneled service, I accepted or rejected requests to review based on my willingness to do the work and my interest in a topic or mechanism (read: SEPs FTW). I’ve mostly kept it pretty minimal. However, I recently messed up because I had a cascade of requests last fall that sucked me in- a “normal” panel (ok, ok, I haven’t done my duty in a while), followed by a topic SEP (ok, ok I am one of a limited pool of experts I’ll do it) and then a RequestThatYouDon’tRefuse. So I’ve been doing more grant review lately than I have usually done in recent years. And I’m thinking about scope of influence on the grants that get funded.
At some point is it even ethical to keep reviewing so damn much***? Should anyone agree to serve successive 4 or 6 year terms as an empaneled reviewer? Should one say yes to every SRO request that comes along? They are going to keep asking so it is up to us to say no. And maybe to recommend the SRO ask some other person who is not on their radar?
___
*There are factors which enhance the SRO pool picking on the same old reviewers, btw. There’s a sort of expectation that if you have review experience you might be okay at it. I don’t know how much SROs talk to each other about prospective reviewers and their experience with the same but there must be some chit chat. “Hey, try Dr. Schmoo, she’s a great reviewer” versus “Oh, no, do not ever ask Dr. Schnortwax, he’s toxic”. There are the diversity rules that they have to follow as well- There must be diversity with respect to the geographic distribution, gender, race and ethnicity of the membership. So people that help the SROs diversity stats might be picked more often than some other people who are straight white males from the most densely packed research areas in the country working on the most common research topics using the most usual models and approaches.
**[cough]Ginther[cough, cough]
***No idea what this threshold should be, btw. But I think there is one.