A bit in Science authored by Jocelyn Kaiser recently covered the preprint posted by Forscher and colleagues which describes a study of bias NIH grant review. I was struck by a response Kaiser obtained from one of the authors on the question of range restriction.

Some have also questioned Devine’s decision to use only funded proposals, saying it fails to explore whether reviewers might show bias when judging lower quality proposals. But she and Forscher point out that half of the 48 proposals were initial submissions that were relatively weak in quality and only received funding after revisions, including four that were of too low quality to be scored.

They really don’t seem to understand NIH grant review where about half of all proposals are “too low quality to be scored”. Their inclusion of only 8% ND applications simply doesn’t cut it. Thinking about this, however, motivated me to go back to the preprint, follow some links to associated data and download the excel file with the original grant scores listed.

I do still think they are missing a key point about restriction of range. It isn’t, much as they would like to think, only about the score. The score on a given round is a value with considerable error, as the group itself described in a prior publication in which the same grant reviewed in different ersatz study sections ended up with a different score. If there is a central tendency for true grant score, which we might approach with dozens of reviews of the same application, then sometimes any given score is going to be too good, and sometimes too bad, as an estimate of the central tendency. Which means that on a second review, the score for the former are going to tend to get worse and the scores for the latter are going to tend to get better. The authors only selected the ones that tended to get better for inclusion (i.e., the ones that reached funding on revision).

Anther way of getting at this is to imagine two grants which get the same score in a given review round. One is kinda meh, with mostly reasonable approaches and methods from a pretty good PI with a decent reputation. The other grant is really exciting, but with some ill considered methodological flaws and a missing bit of preliminary data. Each one comes back in revision with the former merely shined up a bit and the latter with awesome new preliminary data and the methods fixed. The meh one goes backward (enraging the PI who “did everything the panel requested”) and the exciting one is now in the fundable range.

The authors have made the mistake of thinking that grants that are discussed, but get the same score well outside the range of funding, are the same in terms of true quality. I would argue that the fact that the “low quality” ones they used were revisable into the fundable range makes them different from the similar scoring applications that did not eventually win funding.

In thinking about this, I came to realize another key bit of positive control data that the authors could provide to enhance our confidence in their study. I scanned through the preprint again and was unable to find any mention of them comparing the original scores of the proposals with the values that came out of their study. Was there a tight correlation? Was it equivalently tight across all of their PI name manipulations? To what extent did the new scores confirm the original funded, low quality and ND outcomes?

This would be key to at least partially counter my points about the range of applications that were included in this study. If the test reviewer subjects found the best original scored grants to be top quality, and the worst to be the worst, independent of PI name then this might help to reassure us that the true quality range within the discussed half was reasonably represented. If, however, the test subjects often reviewed the original top grants lower and the lower grants higher, this would reinforce my contention that the range of the central tendencies for the quality of the grant applications was narrow.

So how about it, Forscher et al? How about showing us the scores from your experiment for each application by PI designation along with the original scores?
__
Patrick Forscher William Cox Markus Brauer Patricia Devine, No race or gender bias in a randomized experiment of NIH R01 grant reviews. Created on: May 25, 2018 | Last edited: May 25, 2018; posted on PsyArXiv

I recently discussed some of the problems with a new pre-print by Forscher and colleagues describing a study which purports to evaluate bias in the peer review of NIH grants.

One thing that I figured out today is that the team that is funded under the grant which supported the Forscher et al study also produced a prior paper that I already discussed. That prior discussion focused on the use of only funded grants to evaluate peer review behavior, and the corresponding problems of a restricted range. The conclusion of this prior paper was that reviewers didn’t agree with each other in the evaluation of the same grant. This, in retrospect, also seems to be a design that was intended to fail. In that instance designed to fail to find correspondence between reviewers, just as the Forscher study seems constructed to fail to find evidence of bias.

I am working up a real distaste for the “Transformative” research project (R01 GM111002; 9/2013-6/2018) funded to PIs M. Carnes and P. Devine that is titled EXPLORING THE SCIENCE OF SCIENTIFIC REVIEW. This project is funded to the tune of $465,804 in direct costs in the final year and reached as high as $614,398 direct in year 3. We can, I think, fairly demand a high standard for the resulting science. I do not think this team is meeting a high standard.

One of the papers (Pier et al 2017) produced by this project discusses the role of study section discussion in revising/calibrating initial scoring.

Results suggest that although reviewers within a single panel agree more following collaborative discussion, different panels agree less after discussion, and Score Calibration Talk plays a pivotal role in scoring variability during peer review.

So they know. They know that scores change through discussion and they know that a given set of applications can go in somewhat different directions based on who is reviewing. They know that scores can change depending on what other ersatz panel members are included and perhaps depending on how the total number of grants are distributed to reviewers in those panels. The study described in the Forscher pre-print did not convene panels:

Reviewers were told we would schedule a conference call to discuss the proposals with other reviewers. No conference call would actually occur; we informed the prospective reviewers of this call to better match the actual NIH review process.

Brauer is an overlapping co-author. The senior author on the Forscher study is Co-PI, along with the senior author of the Pier et al. papers, on the grant that funds this work. The Pier et al 2017 Res Eval paper shows that they know full well that study section discussion is necessary to “better match the actual NIH review process”. Their paper shows that study section discussion does so in part by getting better agreement on the merits of a particular proposal across the individuals doing the reviewing (within a given panel). By extension, not including any study section type discussion is guaranteed to result in a more variable assessment. To throw noise into the data. Which has a tendency to make it more likely that a study will arrive at a null result, as the Forscher et al study did.

These investigators also know that the grant load for NIH reviewers is not typically three applications, as was used in the study described in the Forscher pre-print. From Pier et al 2017 again:

We further learned that although a reviewer may be assigned 9–10 applications for a standing study section, ad hoc panels or SEPs can receive assignments as low as 5–6 applications; thus, the SRO assigned each reviewer to evaluate six applications based on their scientific expertise, as we believed a reviewer load on the low end of what is typical would increase the likelihood of study participation.

I believe that the reviewer load is critically important if you are trying to mimic the way scores are decided by the NIH review process. The reason is that while several NIH documents and reviewer guides pay lipservice to the idea that the review of each grant proposal is objective, the simple truth is that review is comparative.

Grant applications are scored on a 1-9 scale with descriptors ranging from Exceptional (1) to Very Good (4) to Poor (9). On an objective basis, I and many other experienced NIH grant reviewers argue, the distribution of NIH grant applications (all of them) is not flat. There is a very large peak around the Excellent to Very Good (i.e., 3-4) range, in my humble estimation. And if you are familiar with review you will know that there is a pronounced tendency of reviewers, unchecked, to stack their reviews around this range. They do it within reviewer and they do it as a panel. This is why the SRO (and Chair, occasionally) spends so much time before the meeting exhorting the panel members to spread their scores. To flatten the objective distribution of merit into a more linear set of scores. To, in essence, let a competitive ranking procedure sneak into this supposedly objective and non-comparative process.

Many experienced reviewers understand why this is being asked of them, endorse it as necessary (at the least) and can do a fair job of score spreading*.

The fewer grants a reviewer has on the immediate assignment pile, the less distance there need be across this pile. If you have only three grants and score them 2, 3 and 4, well hey, scores spread. If, however, you have a pile of 6 grants and score them 2, 3, 3, 3, 4, 4 (which is very likely the objective distribution) then you are quite obviously not spreading your scores enough. So what to do? Well, for some reason actual NIH grant reviewers are really loathe to throw down a 1. So 2 is the top mark. Gotta spread the rest. Ok, how about 2, 3, 3…er 4 I mean. Then 4, 4…shit. 4, 5 and oh 6 seems really mean so another 5. Ok. 2, 3, 4, 4, 5, 5. phew. Scores spread, particularly around the key window that is going to make the SRO go ballistic.

Wait, what’s that? Why are reviewers working so hard around the 2-4 zone and care less about 5+? Well, surprise surprise that is the place** where it gets serious between probably fund, maybe fund and no way, no how fund. And reviewers are pretty sensitive to that**, even if they do not know precisely what score will mean funded / not funded for any specific application.

That little spreading exercise was for a six grant load. Now imagine throwing three more applications into this mix for the more typical reviewer load.

For today, it is not important to discuss how a reviewer decides one grant comes before the other or that perhaps two grants really do deserve the same score. The point is that grants are assessed against each other. In the individual reviewer’s stack and to some extent across the entire study section. And it matters how many applications the reviewer has to review. This affects that reviewer’s pre-discussion calibration of scores.

Read phase, after the initial scores are nominated and before the study section meets, is another place where re-calibration of scores happens. (I’m not sure if they included that part in the Pier et al studies, it isn’t explicitly mentioned so presumably not?)

If the Forscher study only gave reviewers three grants to review, and did not do the usual exhortation to spread scores, this is a serious flaw. Another serious and I would say fatal flaw in the design. The tendency of real reviewers is to score more compactly. This is, presumably, enhanced by the selection of grants that were funded (either on the version that used or in revision) which we might think would at least cut off the tail of really bad proposals. The ranges will be from 2-4*** instead of 2-5 or 6. Of course this will obscure differences between grants, making it much much more likely that no effect of sex or ethnicity (the subject of the Forscher et al study) of the PI would emerge.

__
Elizabeth L. Pier, Markus Brauer, Amarette Filut, Anna Kaatz, Joshua Raclaw, Mitchell J. Nathan, Cecilia E. Ford and Molly Carnes, Low agreement among reviewers evaluating the same NIH grant applications. 2018, PNAS: published ahead of print March 5, 2018, https://doi.org/10.1073/pnas.1714379115

Elizabeth L. Pier, Joshua Raclaw, Anna Kaatz, Markus Brauer,Molly Carnes, Mitchell J. Nathan and Cecilia E. Ford. ‘Your comments are meaner than your score’: score calibration talk influences intra- and inter-panel variability during scientific grant peer review, Res Eval. 2017 Jan; 26(1): 1–14. Published online 2017 Feb 14. doi: 10.1093/reseval/rvw025

Patrick Forscher, William Cox, Markus Brauer, and Patricia Devine. No race or gender bias in a randomized experiment of NIH R01 grant reviews. Created on: May 25, 2018 | Last edited: May 25, 2018 https://psyarxiv.com/r2xvb/

*I have related before that when YHN was empanled on a study section he practiced a radical version of score spreading. Initial initial scores for his pile were tagged to the extreme ends of the permissible scores (this was under the old system) and even intervals within that were used to place the grants in his pile.

**as are SROs. I cannot imagine a SRO ever getting on your case to spread scores for a pile that comes in at 2, 3, 4, 5, 7, 7, 7, 7, 7.

***Study sections vary a lot in their precise calibration of where the hot zone is and how far apart scores are spread. This is why the more important funding criterion is the percentile, which attempts to adjust for such study section differences. This is the long way of saying I’m not encouraging comments naggling over these specific examples. The point should stand regardless of your pet study sections’ calibration points.

In August of 2011 the Ginther et al. paper published in Science let us know that African-American PIs were disadvantaged in the competition for NIH awards. There was an overall success rate disparity identified as well as a related necessity of funded PIs to revise their proposals more frequently to become funded.

Both of these have significant consequences for what science gets done and how careers unfold.

I have been very unhappy with the NIH response to this finding.

I have recently become aware of a “Transformative” research project (R01 GM111002; 9/2013-6/2018) funded to PIs M. Carnes and P. Devine that is titled EXPLORING THE SCIENCE OF SCIENTIFIC REVIEW. From the description/abstract:

Unexplained disparities in R01 funding outcomes by race and gender have raised concern about bias in NIH peer review. This Transformative R01 will examine if and how implicit (i.e., unintentional) bias might occur in R01 peer review… Specific Aim #2. Determine whether investigator race, gender, or institution causally influences the review of identical proposals. We will conduct a randomized, controlled study in which we manipulate characteristics of a grant principal investigator (PI) to assess their influence on grant review outcomes…The potential impact is threefold; this research will 1) discover whether certain forms of cognitive bias are or are not consequential in R01 peer review… the results of our research could set the stage for transformation in peer review throughout NIH.

It could not be any clearer that this project is a direct NIH response to the Ginther result. So it is fully and completely appropriate to view any resulting studies in this context. (Just to get this out of the way.)

I became aware of this study through a Twitter mention of a pre-print that has been posted on PsyArXiv. The version I have read is:

No race or gender bias in a randomized experiment of NIH R01 grant reviews. Patrick Forscher William Cox Markus Brauer Patricia Devine Created on: May 25, 2018 | Last edited: May 25, 2018

The senior author is one of the Multi-PI on the aforementioned funded research project and the pre-print makes this even clearer with a statement.

Funding: This research was supported by 5R01GM111002-02, awarded to the last author.

So while yes, the NIH does not dictate the conduct of research under awards that it makes, this effort can be fairly considered part of the NIH response to Ginther. As you can see from comparing the abstract of the funded grant to the pre-print study there is every reason to assume the nature of the study as conducted was actually spelled out in some detail in the grant proposal. Which the NIH selected for funding, apparently with some extra consideration*.

There are many, many, many things wrong with the study as depicted in the pre-print. It is going to take me more than one blog post to get through them all. So consider none of these to be complete. I may also repeat myself on certain aspects.

First up today is the part of the experimental design that was intended to create the impression in the minds of the reviewers that a given application had a PI of certain key characteristics, namely on the spectra of sex (male versus female) and ethnicity (African-American versus Irish-American). This, I will note, is a tried and true design feature for some very useful prior findings. Change the author names to initials and you can reduce apparent sex-based bias in the review of papers. Change the author names to African-American sounding ones and you can change the opinion of the quality of legal briefs. Change sex, apparent ethnicity of the name on job resumes and you can change the proportion called for further interviewing. Etc. You know the literature. I am not objecting to the approach, it is a good one, but I am objecting to its application to NIH grant review and the way they applied it.

The problem with application of this to NIH Grant review is that the Investigator(s) is such a key component of review. It is one of five allegedly co-equal review criteria and the grant proposals include a specific document (Biosketch) which is very detailed about a specific individual and their contributions to science. This differs tremendously from the job of evaluating a legal brief. It varies tremendously from reviewing a large stack of resumes submitted in response to a fairly generic job. It even differs from the job of reviewing a manuscript submitted for potential publication. NIH grant review specifically demands an assessment of the PI in question.

What this means is that it is really difficult to fake the PI and have success in your design. Success absolutely requires that the reviewers who are the subjects in the study both fail to detect the deception and genuinely develop a belief that the PI has the characteristics intended by the manipulation (i.e., man versus woman and black versus white). The authors recognized this, as we see from page 4 of the pre-print:

To avoid arousing suspicion as to the purpose of the study, no reviewer was asked to evaluate more than one proposal written by a non-White-male PI.

They understand that suspicion as to the purpose of the study is deadly to the outcome.

So how did they attempt to manipulate the reviewer’s percept of the PI?

Selecting names that connote identities. We manipulated PI identity by assigning proposals names from which race and sex can be inferred 11,12. We chose the names by consulting tables compiled by Bertrand and Mullainathan 11. Bertrand and Mullainathan compiled the male and female first names that were most commonly associated with Black and White babies born in Massachusetts between 1974 and 1979. A person born in the 1970s would now be in their 40s, which we reasoned was a plausible age for a current Principal Investigator. Bertrand and Mullainathan also asked 30 people to categorize the names as “White”, “African American”, “Other”, or “Cannot tell”. We selected first names from their project that were both associated with and perceived as the race in question (i.e., >60 odds of being associated with the race in question; categorized as the race in question more than 90% of the time). We selected six White male first names (Matthew, Greg, Jay, Brett, Todd, Brad) and three first names for each of the White female (Anne, Laurie, Kristin), Black male (Darnell, Jamal, Tyrone), and Black female (Latoya, Tanisha, Latonya) categories. We also chose nine White last names (Walsh, Baker, Murray, Murphy, O’Brian, McCarthy, Kelly, Ryan, Sullivan) and three Black last names (Jackson, Robinson, Washington) from Bertrand and Mullainathan’s lists. Our grant proposals spanned 12 specific areas of science; each of the 12 scientific topic areas shared a common set of White male, White female, Black male, and Black female names. First names and last names were paired together pseudo-randomly, with the constraints that (1) any given combination of first and last names never occurred more than twice across the 12 scientific topic areas used for the study, and (2) the combination did not duplicate the name of a famous person (i.e., “Latoya Jackson” never appeared as a PI name).

So basically the equivalent of blackface. They selected some highly stereotypical “black” first names and some “white” surnames which are almost all Irish (hence my comment above about Irish-American ethnicity instead of Caucasian-American. This also needs some exploring.).

Sorry, but for me this heightens concern that reviewers deduce what they are up to. Right? Each reviewer had only three grants (which is a problem for another post) and at least one of them practically screams in neon lights “THIS PI IS BLACK! DID WE MENTION BLACK? LIKE REALLY REALLY BLACK!”. As we all know, there are not 33% of applications to the NIH from African-American investigators. Any experienced reviewer would be at risk of noticing something is a bit off. The authors say nay.

A skeptic of our findings might put forward two criticisms: .. As for the second criticism, we put in place careful procedures to screen out reviewers who may have detected our manipulation, and our results were highly robust even to the most conservative of reviewer exclusion criteria.

As far as I can tell their “careful procedures” included only:

We eliminated from analysis 34 of these reviewers who either mentioned that they learned that one of the named personnel was fictitious or who mentioned that they looked up a paper from a PI biosketch, and who were therefore likely to learn that PI names were fictitious.

“who mentioned”.

There was some debriefing which included:

reviewers completed a short survey including a yes-or-no question about whether they had used outside resources. If they reported “yes”, they were prompted to elaborate about what resources they used in a free response box. Contrary to their instructions, 139 reviewers mentioned that they used PubMed or read articles relevant to their assigned proposals. We eliminated the 34 reviewers who either mentioned that they learned of our deception or looked up a paper in the PI’s biosketch and therefore were very likely to learn of our deception. It is ambiguous whether the remaining 105 reviewers also learned of our deception.

and

34 participants turned in reviews without contacting us to say that they noticed the deception, and yet indicated in review submissions that some of the grant personnel were fictitious.

So despite their instructions and discouraging participants from using outside materials, significant numbers of them did. And reviewers turned in reviews without saying they were on to the deception when they clearly were. And the authors did not, apparently, debrief in a way that could definitively say whether all, most or few reviewers were on to their true purpose. Nor does there appear to be any mention of asking reviewers afterwards of whether they knew about Ginther, specifically, or disparate grant award outcomes in general terms. That would seem to be important.

Why? Because if you tell most normal decent people that they are to review applications to see if they are biased against black PIs they are going to fight as hard as they can to show that they are not a bigot. The Ginther finding was met with huge and consistent protestation on the part of experienced reviewers that it must be wrong because they themselves were not consciously biased against black PIs and they had never noticed any overt bias during their many rounds of study section. The authors clearly know this. And yet they did not show that the study participants were not on to them. While using those rather interesting names to generate the impression of ethnicity.

The authors make several comments throughout the pre-print about how this is a valid model of NIH grant review. They take a lot of pride in their design choices in may places. I was very struck by:

names that were most commonly associated with Black and White babies born in Massachusetts between 1974 and 1979. A person born in the 1970s would now be in their 40s, which we reasoned was a plausible age for a current Principal Investigator.

because my first thought when reading this design was “gee, most of the African-Americans that I know who have been NIH funded PIs are named things like Cynthia and Carl and Maury and Mike and Jean and…..dude something is wrong here.“. Buuuut, maybe this is just me and I do know of one “Yasmin” and one “Chanda” so maybe this is a perceptual bias on my part. Okay, over to RePORTER to search out the first names. I’ll check all time and for now ignore F- and K-mechs because Ginther focused on research awards, iirc. Darnell (4, none with the last names the authors used); LaTonya (1, ditto); LaToya (2, one with middle / maiden? name of Jones, we’ll allow that and oh, she’s non-contact MultiPI); Tyrone (6; man one of these had so many awards I just had to google and..well, not sure but….) and Tanisha (1, again, not a president surname).

This brings me to “Jamal”. I’m sorry but in science when you see a Jamal you do not think of a black man. And sure enough RePORTER finds a number of PIs named Jamal but their surnames are things like Baig, Farooqui, Ibdah and Islam. Not US Presidents. Some debriefing here to ensure that reviewers presumed “Jamal” was black would seem to be critical but, in any case, it furthers the suspicion that these first names do not map onto typical NIH funded African-Americans. This brings us to the further observation that first names may convey not merely ethnicity but something about subcategories within this subpopulation of the US. It could be that these names cause percepts bound up in geography, age cohort, socioeconomic status and a host of other things. How are they controlling for that? The authors make no mention that I saw.

The authors take pains to brag on their clever deep thinking on using an age range that would correspond to PIs in their 40s (wait, actually 35-40, if the funding of the project in -02 claim is accurate, when the average age of first major NIH award is 42?) to select the names and then they didn’t even bother to see if these names appeared on the NIH database of funded awards?

The takeaway for today is that the study validity rests on the reviewers not knowing the true purpose. And yet they showed that reviewers did not follow their instructions for avoiding outside research and that reviewers did not necessarily volunteer that they’d detected the name deception*** and yet some of them clearly had. Combine this with the nature of how the study created the impression of PI ethnicity via these particular first names and I think this can be considered a fatal flaw in the study.
__

Race, Ethnicity, and NIH Research Awards, Donna K. Ginther, Walter T. Schaffer, Joshua Schnell, Beth Masimore, Faye Liu, Laurel L. Haak, Raynard Kington. Science 19 Aug 2011:Vol. 333, Issue 6045, pp. 1015-1019
DOI: 10.1126/science.1196783

*Notice the late September original funding date combined with the June 30 end date for subsequent years? This almost certainly means it was an end of year pickup** of something that did not score well enough for regular funding. I would love to see the summary statement.

**Given that this is a “Transformative” award, it is not impossible that they save these up for the end of the year to decide. So I could be off base here.

*** As a bit of a sidebar there was a twitter person who claimed to have been a reviewer in this study and found a Biosketch from a supposedly female PI referring to a sick wife. Maybe the authors intended this but it sure smells like sloppy construction of their materials. What other tells were left? And if they *did* intend to bring in LBTQ assumptions…well this just seems like throwing random variables into the mix to add noise.

DISCLAIMER: As per usual I encourage you to read my posts on NIH grant matters with the recognition that I am an interested party. The nature of NIH grant review is of specific professional interest to me and to people who are personally and professionally close to me.

I had a thought about Ginther just after hearing a radio piece on the Asian-Americans that are suing Harvard over entrance discrimination.

The charge is that Asian-American students need to have better grades and scores  than white students to receive an admissions bid.

The discussion of the Ginther study revolved around the finding that African-American applicant PIs were less likely than PIs of other groups to receive NIH grant funding. This is because Asian-Americans, for example, did as well as white PIs. Our default stance, I assume, is that being a white PI is the best that it gets. So if another group does as well, this is evidence of a lack of bias.

But what if Asian-American PIs submit higher quality applications as a group?

How would we ever know if there was discrimination against them in NIH grant award?

Jeremy Berg made a comment

If you look at the data in the Ginther report, the biggest difference for African-American applicants is the percentage of “not discussed” applications. For African-Americans, 691/1149 =60.0% of the applications were not discussed whereas for Whites, 23,437/58,124 =40% were not discussed (see supplementary material to the paper). The actual funding curves (funding probability as a function of priority score) are quite similar (Supplementary Figure S1). If applications are not discussed, program has very little ability to make a case for funding, even if this were to be deemed good policy.

that irritated me because it sounds like yet another version of the feigned-helpless response of the NIH on this topic. It also made me take a look at some numbers and bench race my proposal that the NIH should, right away, simply pick up enough applications from African American PIs to equalize success rates. Just as they have so clearly done, historically, for Early Stage Investigators and very likely done for woman PIs.

Here’s the S1 figure from Ginther et al, 2011:
Ginther-S1

[In the below analysis I am eyeballing the probabilities for illustration’s sake. If I’m off by a point or two this is immaterial to the the overall thrust of the argument.]

My knee jerk response to Berg’s comment is that there are plenty of African-American PI’s applications available for pickup. As in, far more than would be required to make up the aggregate success rate discrepancy (which was about 10% in award probability). So talking about the triage rate is a distraction (but see below for more on that).

There is a risk here of falling into the Privilege-Thinking, i.e. that we cannot possible countenance any redress of discrimination that, gasp, puts the previously underrepresented group above the well represented groups even by the smallest smidge. But looking at Supplementary Fig1 from Gither, and keeping in mind that the African American PI application number is only 2% of the White applications, we can figure out that a substantial effect on African American PI’s award probability would cause only an imperceptible change in that for White PI applications. And there’s an amazing sweetener….merit.

Looking at the award probability graph from S1 of Ginther, we note that there are some 15% of the African-American PI’s grants scoring in the 175 bin (old scoring method, youngsters) that were not funded. About 55-56% of all ethnic/racial category grants in the next higher (worse) scoring bin were funded. So if Program picks up more of the better scoring applications from African American PIs (175 bin) at the expense of the worse scoring applications of White PIs (200 bin), we have actually ENHANCED MERIT of the total population of funded grants. Right? Win/Win.

So if we were to follow my suggestion, what would be the relative impact? Well thanks to the 2% ratio of African-American to White PI apps, it works like this:

Take the 175 scoring bin in which about 88% of white PIs and 85% of AA PIs were successful. Take a round number of 1,000 apps in that scoring bin (for didactic purposes, also ignoring the other ethnicities) and you get a 980/20 White/African-AmericanPI ratio of apps. In that 175 bin we’d need 3 more African-American PI apps funded to get to 100%. In the next higher (worse) scoring bin (200 score), about 56% of White PI apps were funded. Taking three from this bin and awarding three more AA PI awards in the next better scoring bin would plunge the White PI award probability from 56% to 55.7%. Whoa, belt up cowboy.

Moving down the curve with the same logic, we find in the 200 score bin that there are about 9 AA PI applications needed to put the 200 score bin to 100%. Looking down to the next worse scoring bin (225) and pulling these 9 apps from white PIs we end up changing the award probability for these apps from 22% to ..wait for it….. 20.8%.

And so on.

(And actually, the percentage changes would be smaller in reality because there is typically not a flat distribution across these bins and there are very likely more applications in each worse-scoring bin compared to the next better-scoring bin. I assumed 1,000 in each bin for my example.)

Another way to look at this issue is to take Berg’s triage numbers from above. To move to 40% triage rate for the African-AmericanPI applications, we need to shift 20% (230 applications) into the discussed pile. This represents a whopping 0.4% of the White PI apps being shifted onto the triage pile to keep the numbers discussed the same.

These are entirely trivial numbers in terms of the “hit” to the chances of White PIs and yet you could easily equalize the success rate or award probability for African-American PIs.

It is even more astounding that this could be done by picking up African-American PI applications that scored better than the White PI applications that would go unfunded to make up the difference.

Tell me how this is not a no-brainer for the NIH?

As you know I am distinctly unimpressed with the NIH’s response to the Ginther report which identified a disparity in the success rate of African-American PIs when submitting grant applications to the NIH.

The NIH response (i.e., where they have placed their hard money investment in change) has been to blame pipeline issues. The efforts are directed at getting more African-American trainees into the pipeline and, somehow, training them better. The subtext here is twofold.

First, it argues that the problem is that the existing African-American PIs submitting to the NIH just kinda suck. They are deserving of lower success rates! Clearly. Otherwise, the NIH would not be looking in the direction of getting new ones. Right? Right.

Second, it argues that there is no actual bias in the review of applications. Nothing to see here. No reason to ask about review bias or anything. No reason to ask whether the system needs to be revamped, right now, to lead to better outcome.

A journalist has been poking around a bit. The most interesting bits involve Collins’ and Tabak’s initial response to Ginther and the current feigned-helplessness tack that is being followed.

From Paul Basken in the Chronicle of Higher Education:

Regarding the possibility of bias in its own handling of grant applications, the NIH has taken some initial steps, including giving its top leaders bias-awareness training. But a project promised by the NIH’s director, Francis S. Collins, to directly test for bias in the agency’s grant-evaluation systems has stalled, with officials stymied by the legal and scientific challenges of crafting such an experiment.

“The design of the studies has proven to be difficult,” said Richard K. Nakamura, director of the Center for Scientific Review, the NIH division that handles incoming grant applications.

Hmmm. “difficult”, eh? Unlike making scientific advances, hey, that stuff is easy. This, however, just stumps us.

Dr. Collins, in his immediate response to the Ginther study, promised to conduct pilot experiments in which NIH grant-review panels were given identical applications, one using existing protocols and another in which any possible clue to the applicant’s race—such as name or academic institution—had been removed.

“The well-described and insidious possibility of unconscious bias must be assessed,” Dr. Collins and his deputy, Lawrence A. Tabak, wrote at the time.

Oh yes, I remember this editorial distinctly. It seemed very well-intentioned. Good optics. Did we forget that the head of the NIH is a political appointment with all that that entails? I didn’t.

The NIH, however, is still working on the problem, Mr. Nakamura said. It hopes to soon begin taking applications from researchers willing to carry out such a study of possible biases in NIH grant approvals, and the NIH also recently gave Molly Carnes, a professor of medicine, psychiatry, and industrial and systems engineering at the University of Wisconsin at Madison, a grant to conduct her own investigation of the matter, Mr. Nakamura said.

The legal challenges include a requirement that applicants get a full airing of their submission, he said. The scientific challenges include figuring out ways to get an unvarnished assessment from a review panel whose members traditionally expect to know anyone qualified in the field, he said.

What a freaking joke. Applicants have to get a full airing and will have to opt-in, eh? Funny, I don’t recall ever being asked to opt-in to any of the non-traditional review mechanisms that the CSR uses. These include phone-only reviews, video-conference reviews and online chat-room reviews. Heck, they don’t even so much as disclose that this is what happened to your application! So the idea that it is a “legal” hurdle that is solved by applicants volunteering for their little test is clearly bogus.

Second, the notion that a pilot study would prevent “full airing” is nonsense. I see very few alternatives other than taking the same pool of applications and putting them through regular review as the control condition and then trying to do a bias-decreasing review as the experimental condition. The NIH is perfectly free to use the normal, control review as the official review. See? No difference in the “full airing”.

I totally agree it will be scientifically difficult to try to set up PI blind review but hey, since we already have so many geniuses calling for blinded review anyway…this is well worth the effort.

But “blind” review is not the only way to go here. How’s about simply mixing up the review panels a bit? Bring in a panel that is heavy in precisely those individuals who have struggled with lower success rates- based on PI characteristics, University characteristics, training characteristics, etc. See if that changes anything. Take a “normal” panel and provide them with extensive instruction on the Ginther data. Etc. Use your imagination people, this is not hard.

Disappointingly, the CHE piece contains not one single bit of investigation into the real question of interest. Why is this any different from any other area of perceived disparity between interests and study section outcome at the NIH? From topic domain to PI characteristics (sex and relative age) to University characteristics (like aggregate NIH funding, geography, Congressional district, University type/rank, etc) the NIH is full willing to use Program prerogative to redress the imbalance. They do so by funding grants out of order and, sometimes, by setting up funding mechanisms that limit who can compete for the grants.

2013-FundingByCareerStageIn the recent case of young/recently transitioned investigators they have trumpeted the disparity loudly, hamfistedly and brazenly “corrected” the study section disparity with special paylines and out of order pickups that amount to an affirmative action quota system [PDF].
All with exceptionally poor descriptions of exactly why they need to do so, save “we’re eating out seed corn” and similar platitudes. All without any attempt to address the root problem of why study sections return poorer scores for early stage investigators. All without proving bias, describing the nature of the bias and without clearly demonstrating the feared outcome of any such bias.

“Eating our seed corn” is a nice catch phrase but it is essentially meaningless. Especially when there are always more freshly trained PHD scientist eager and ready to step up. Why would we care if a generation is “lost” to science? The existing greybeards can always be replaced by whatever fresh faces are immediately available, after all. And there was very little crying about the “lost” GenerationX scientists, remember. Actually, none, outside of GenerationX itself.

The point being, the NIH did not wait for overwhelming proof of nefarious bias. They just acted very directly to put a quota system in place. Although, as we’ve seen in recent data this has slipped a bit in the past two Fiscal Years, the point remains.

Why, you might ask yourself, are they not doing the same in response to Ginther?

In case my comment never makes it out of moderation at RockTalk….

Interesting to contrast your Big Data and BRAINI approaches with your one for diversity. Try switching those around…”establish a forum..blah, blah…in partnership…blah, blah..to engage” in Big Data. Can’t you hear the outraged howling about what a joke of an effort that would be? It is embarrassing that the NIH has chosen to kick the can down the road and hide behind fake-helplessness when it comes to enhancing diversity. In the case of BRAINI, BigData and yes, discrimination against a particular class of PI applicants (the young) the NIH fixes things with hard money- awards for research projects. Why does it draw back when it comes to fixing the inequality of grant awards identified in Ginther?

When you face up to the reasons why you are in full cry and issuing real, R01 NGA solutions for the dismal plight of ESIs and doing nothing similar for underrepresented PIs then you will understand why the Ginther report found what it did.

ESIs continue, at least six years on, to benefit from payline breaks and pickups. You trumpet this behavior as a wonderful thing. Why are you not doing the same to redress the discrimination against underrepresented PIs? How is it different?

The Ginther bombshell dropped in August of 2011. There has been plenty of time to put in real, effective fixes. The numbers are such that the NIH would have had to fund mere handfuls of new grants to ensure success rate parity. And they could still do all the can-kicking, ineffectual hand waving stuff as well.

And what about you, o transitioning scientists complaining about an “unfair” NIH system stacked against the young? Is your complaint really about fairness? Or is it really about your own personal success?

If it is a principled stand, you should be name dropping Ginther as often as you do the fabled “42 years before first R01” stat.

As noted recently by Bashir, the NIH response to the Ginther report contrasts with their response to certain other issues of grant disparity:

I want to contrast this with NIH actions regarding other issues. In that same blog post I linked there is also discussion of the ongoing early career investigator issues. Here is a selection of some of the actions directed towards that problem.

NIH plans to increase the funding of awards that encourage independence like the K99/R00 and early independence awards, and increase the initial postdoctoral researcher stipend.

In the past NIH has also taken actions in modifying how grants are awarded. The whole Early Stage Investigator designation is part of that. Grant pickups, etc.

I don’t want to get all Kanye (“NIH doesn’t care about black researchers”), but priorities, be they individual or institutional, really come though not in talk but actions. Now, I don’t have any special knowledge about the source or solution to the racial disparity. But the NIH response here seems more along the lines of adequate than overwhelming.

In writing another post, I ran across this 2002 bit in Science. This part stands out:

It’s not because the peer-review system is biased against younger people, Tilghman argues. When her NRC panel looked into this, she says, “we could find no data at all [supporting the idea] that young people are being discriminated against.”

Although I might take issue with what data they chose to examine and the difficulty of proving “discrimination” in a subjective process like grant review, the point at hand is larger. The NIH had a panel which could find no evidence of discrimination and they nevertheless went straight to work picking up New Investigator grants out of the order of review to guarantee an equal outcome!

Interesting, this is.

are offered by Bashir.

Even if these recommendations were enacted tomorrow, and worked exactly as hoped, the gains would be slow and marginal. #1 seem to more address the problem of under representation.

Go play over there.

You are familiar with the #GintherGap, the disparity of grant award at NIH that leaves the applications with Black PIs at substantial disadvantage. Many have said from the start that it is unlikely that this is unique to the NIH and we only await similar analyses to verify that supposition.

Curiously the NSF has not, to my awareness, done any such study and released it for public consumption.

Well, a group of scientists have recently posted a preprint:

Chen, C. Y., Kahanamoku, S. S., Tripati, A., Alegado, R. A., Morris, V. R., Andrade, K., & Hosbey, J. (2022, July 1). Decades of systemic racial disparities in funding rates at the National Science Foundation. OSF Preprints. July 1. doi:10.31219/osf.io/xb57u.

It reviews National Science Foundation awards (from 1996-2019) and uses demographics provided voluntarily by PIs. They found that the applicant PIs were 66% white, 3% Black, 29% Asian and below 1% for each of American Indian/Alaska Native and Native Hawaiian/Pacific Islander groups. They also found that across the reviewed years, the overall funding rate varied from 22%-34%, so the data were represented as the rate for each group relative to the average for each year. In Figure 1, reproduced below, you can see that applications with white PIs enjoy a nice consistent advantage relative to other groups and the applications with Asian PIs suffer a consistant disadvantage. The applications with Black PIs are more variable year over year but are mostly below average except for 5 years when they are right at the average. The authors note this means that in 2019, there were 798 awards with white PIs above expected value, and 460 fewer than expected awarded with Asian PIs. The size of the disparity differs slightly across the directorates of the NSF (there are seven, broken down by discipline such as Biological Sciences, Engineering, Math and Physical Sciences, Education and Human Resources, etc) but the same dis/advantage based on PI race remains.

Fig 1B from Chen et al. 2022 preprint

It gets worse. It turns out that these numbers include both Research and Non-Research (conference, training, equipment, instrumentation, exploratory) awards. Which represent 82% and 18% of awards, with the latter generally being awarded at 1.4-1.9 times the rate for Research awards in a given year. For white

Fig 3 from Chen et al 2022 preprint FY 13 – 19;
open = Non-Research, closed = Research

PI applications the two types both are funded at higher than the average rate, however significant differences emerge for Black and Asian PIs with Research awards having the lower probability of success.

So why is this the case. Well, the white PI applications get better scores from extramural reviewers. Here, I am not expert in how NSF works. A mewling newbie really. But they solicit peer reviewers which assign merit scores from 1 (Poor) to 5 (Excellent). The preprint shows the distributions of scores for FY15 and FY16 Research applications, by PI race, in Figure 5. Unsurprisingly there is a lot of overlap but the average score for white PI apps is superior to that for either Black or Asian PI apps. Interestingly, average scores are worse for Black PI apps than for Asian PI apps. Interesting because the funding disparity is larger for Asian PIs than for Black PIs. And as you can imagine, there is a relationship between score and chances of being funded but it is variable. Kind of like a Programmatic decision on exception pay or the grey zone function in NIH land. Not sure exactly how this matches up over at NSF but the first author of the preprint put me onto a 2015 FY report on the Merit Review Process that addresses this. Page 74 of the PDF (NSB-AO-206-11) has a Figure 3.2 showing the success rates by average review score and PI race. As anticipated, proposals in the 4.75 (score midpoint) bin are funded at rates of 80% or better. About 60% for the 4.25 bin, 30% for the 3.75 bin and under 10% for the 3.25 bin. Interestingly, the success rates for Black PI applications are higher than for white PI applications at the same score. The Asian PI success rates are closer to the white PI success rates but still a little bit higher, at comparable scores. So clearly something is going on with funding decision making at NSF to partially counter the poorer scores, on average, from the reviewers. The Asian PI proposals do not have as much of this advantage. This explains why the overall success rates for Black PI applications are closer to the average compared with the Asian PI apps, despite worse average scores.

Fig 5 from Chen et al 2022 preprint

One more curious factor popped out of this study. The authors, obviously, had to use only the applications for which a PI had specified their race. This was about 96% in 1999-2000 when they were able to include these data. However it was down to 90% in 2009, 86% in 2016 and then took a sharp plunge in successive years to land at 76% in 2019. The first author indicated on Twitter that this was down to 70% in 2020, the largest one year decrement. This is very curious to me. It seems obvious that PIs are doing whatever they think is going to help them get funded. For the percentage to be this large it simply has to involve large numbers of white PIs and likely Asian PIs as well. It cannot simply be Black PIs worried that racial identification will disadvantage them (a reasonable fear, given the NIH data reported in Ginther et al.) I suspect a certain type of white academic who has convinced himself (it’s usually a he) that white men are discriminated against, that the URM PIs have an easy ride to funding and the best thing for them to do is not to declare themselves white. Also another variation on the theme, the “we shouldn’t see color so I won’t give em color” type. It is hard not to note that the US has been having a more intensive discussion about systemic racial discrimination, starting somewhere around 2014 with the shooting of Michael Brown in Ferguson MO. This amped up in 2020 with the strangulation murder of George Floyd in Minneapolis. Somewhere in here, scientists finally started paying attention to the Ginther Gap. News started getting around. I think all of this is probably causally related to sharp decreases in the self-identification of race on NSF applications. Perhaps not for all the same reasons for every person or demographic. But if it is not an artifact of the grant submission system, this is the most obvious conclusion.

There is a ton of additional analysis in the preprint. Go read it. Study. Think about it.

Additional: Ginther et al. (2011) Race, ethnicity, and NIH research awards. Science, 2011 Aug 19; 333(6045):1015-9. [PubMed]

The latest blog post over at Open Mike, from the NIH honcho of extramural grant award Mike Lauer, addresses “Discussion Rate”. This is, in his formulation, the percent of applicants (in a given Fiscal Year, FY21 in this case) who are PI on at least one application that reaches discussion. I.e., not triaged. The post presents three Tables, with this Discussion rate (and Funding rate) presented by the Sex of the PI, by race (Asian, Black, White only) or ethnicity (Hispanic or Latino vs non-Hispanic only). The tables further presented these breakdowns by Early Stage Investigator, New Investigator, At Risk and Established. At risk is a category of “researchers that received a prior substantial NIH award but, as best we can tell, will have no funding the following fiscal year if they are not successful in securing a competing award this year.” At this point you may wish to revisit an old blog post by DataHound called “Mind the Gap” which addresses the chances of regaining funding once a PI has lost all NIH grants.

I took the liberty of graphing the By-Race/Ethnicity Discussion rates, because I am a visual thinker.

There seem to be two main things that pop out. First, in the ESI category, the Discussion rate for Black PI apps is a lot lower. Which is interesting. The 60% rate for ESI might be a little odd until you remember that the burden of triage may not fall on ESI applications. At least 50% have to be discussed in each study section, small numbers in study section probably mean that on average it is more than half, and this is NIH wide data for FY 21 (5,410 ESI PIs total). Second, the NI category (New, Not Early on the chart) seems to suffer relative to the other categories.

Then I thought a bit about this per-PI Discussion rate being north of 50% for most categories. And that seemed odd to me. Then I looked at another critical column on the tables in the blog post.

The Median number of applications per applicant was…. 1. That means the mode is 1.

Wow. Just….wow.

I can maybe understand this for ESI applicants, since for many of them this will be their first grant ever submitted.

but for “At Risk”? An investigator who has experience as a PI with NIH funding, is about to have no NIH funding if a grant does not hit, and they are submitting ONE grant application per fiscal year?

I am intensely curious how this stat breaks down by deciles. How many at risk PIs are submitting only one grant proposal? Is it only about half? Two-thirds? More?

As you know, my perspective on the NIH grant getting system is that if you have only put in one grant you are not really trying. The associated implication is that any solutions to the various problems that the NIH grant award system might have that are based on someone not getting their grant after only one try are not likely to be that useful.

I just cannot make this make sense to me. Particularly if the NIH

It is slightly concerning that the NIH is now reporting on this category of investigator. Don’t get me wrong. I believe this NIH system should support a greater expectation of approximately continual funding for investigators who are funded PIs. But it absolutely cannot be 100%. What should it be? I don’t know. It’s debatable. Perhaps more importantly who should be saved? Because after all, what is the purpose of NIH reporting on this category if they do not plan to DO SOMETHING about it? By, presumably, using some sort of exception pay or policy to prevent these at risk PIs from going unfunded.

There was just such a plan bruited about for PIs funded with the ESI designation that were unable to renew or get another grant. They called them Early Established Investigators and described their plans to prioritize these apps in NOT-OD-17-101. This was shelved (NOT-OD-18-214) because “NIH’s strategy for achieving these goals has evolved based on on-going work by an Advisory Committee to the Director (ACD) Next Generation Researchers Initiative Working Group and other stakeholder feedback” and yet asserted “NIH..will use an interim strategy to consider “at risk investigators”..in its funding strategies“. In other words, people screamed bloody murder about how it was not fair to only consider “at risk” those who happened demographically to benefit from the ESI policy.

It is unclear how these “consider” decisions have been made in the subsequent interval. In a way, Program has always “considered” at risk investigators, so it is particularly unclear how this language changes anything. In the early days I had been told directly by POs that my pleas for an exception pay were not as important because “we have to take care of our long funded investigators who will otherwise be out of funding”. This sort of thing came up in study section more than once in my hearing, voiced variously as “this is the last chance for this PIs one grant” or even “the PI will be out of funding if…”. As you can imagine, at the time I was new and full of beans and found that objectionable. Now….well, I’d be happy to have those sentiments applied to me.

There is a new version of this “at risk” consideration that is tied to the new PAR-22-181 on promoting diversity. In case you are wondering why this differs from the famously rescinded NINDS NOSI, well, NIH has managed to find themselves a lawyered excuse.

Section 404M of the Public Health Service Act (added by Section 2021 in Title II, Subtitle C, of the 21st Century Cures Act, P.L. 114-255, enacted December 13, 2016), entitled, “Investing in the Next Generation of Researchers,” established the Next Generation Researchers Initiative within the Office of the NIH Director.  This initiative is intended to promote and provide opportunities for new researchers and earlier research independence, and to maintain the careers of at-risk investigators.  In particular, subsection (b) requires the Director to “Develop, modify, or prioritize policies, as needed, within the National Institutes of Health to promote opportunities for new researchers and earlier research independence, such as policies to increase opportunities for new researchers to receive funding, enhance training and mentorship programs for researchers, and enhance workforce diversity;

enacted December 13, 2016“. So yeah, the NOSI was issued after this and they could very well have used this for cover. The NIH chose not to. Now, the NIH chooses to use this aspect of the appropriations language. And keep in mind that when Congress includes something like this NGRI in the appropriations language, NIH has requested it or accepted it or contributed to exactly how it is construed and written. So this is yet more evidence that their prior stance that the “law” or “Congress” was preventing them from acting to close the Ginther Gap was utter horseshit.

Let’s get back to “at risk” as a more explicitly expressed concern of the NIH. What will these policies mean? Well, we do know that none of this comes with any concrete detail like set aside funds (the PAR is not a PAS) or ESI-style relaxation of paylines. We do know that they do this all the damn time, under the radar. So what gives? Who is being empowered by making this “consideration” of at-risk PI applications more explicit? Who will receive exception pay grants purely because they are at risk? How many? Will it be in accordance with distance from payline? How will these “to enhance diversity” considerations be applied? How will these be balanced against regular old “our long term funded majoritarian investigator is at risk omg” sentiments in the Branches and Divisions?

This is one of the reasons I like the aforementioned Datahound analysis, because at least it gave a baseline of actual data for discussion purposes. A framework a given I or C could follow in starting to make intelligent decisions.

What is the best policy for where, who, what to pick up?

I recently fielded a question from a more junior scientist about what, I think, has been termed research colonialism with specificity to the NIH funding disparity known as the Ginther Gap. One of the outcomes of the Hoppe et al 2019 paper, and the following Lauer et al 2021, was a call for a hard look at research on the health issues of communities of color. How successful are grant proposals on those topics, which ICs are funding them, what are the success rates and what are the budget levels appropriated to, e.g. the NIMHD. I am very much at sea trying to answer the question I was asked, which boiled down to “Why is it always majoritarian PIs being funded to do research with communities of color?”. I really don’t know how to answer that or how to begin to address it with NIH funding data that has been generated so far. However, something came across my transom recently that is a place to start.

The NIH issued RFA-MD-21-004 Understanding and Addressing the Impact of Structural Racism and Discrimination on Minority Health and Health Disparities last year and the resulting projects should be on the RePORTER books by now. I was cued into this by a tweet from the Constellation Project which is something doing co-author networks. That may be useful for a related issue, that of collaboration and co-work. For now, I’m curious about what types of PIs have been able to secure funding from this mechanism. According to my RePORTER search for the RFA, there are currently 17 grants funded.

Of the funded grants, there are 4 from NIMHD, 4 from NIDA, 2 from NIA, 1 each from NIMH, NIHNDS, NINR, NICHD, NIGMS, NIDCD, and NCCIH. In the RFA, NIMHD promised 6-7 awards, NIDA 2, NIA 6, NIGMS 4-6 so obviously NIDA overshot their mark, but the rest are slacking. One each was promised for NIMH, NINDS, NICHD, NIDCD and NCCIH, so all of these are on track. Perhaps we will see a few more grants get funded by the time the FY elapses on Sept 30.

So who is getting funded under this RFA? Doing a quick google on the PIs, and admittedly making some huge assumptions based on the available pictures, I come up with

PI/Multi-PI Contact: White woman (2 NIA; 1 NCCIH; 3 NIDA; 1 NIDCD; 1 NIGMS; 1 NINDS); Black woman (1 NIDA; 1 NICHD; 1 NIMHD); Asian woman (1 NIMHD; 1 NIMHD; 1 NINR); White man (1 NIMHD; 1 NIMH)

Multi-PI, non-contact: Asian woman (1 NIA, 1 NIDA, 1 NIMHD); Black woman (2 NIDA, 1 NIMHD); White woman (1 NIDCD; 1 NIGMS; 1 NINR) Black man (1 NIGMS; 1 NIMH); White man (2 NIMH)

I would say the place I am most likely to be off in terms of someone who appears to me to be white but identifies as a person of color would be white women. Maybe 2-3 I am unsure of. I didn’t bother to keep track of how many of the non-contact PIs are on the proposals with white Contact PIs versus the other way around but….I can’t recall seeing even one where a non-contact white PI was on a proposal with a contact PI who is Black or Asian. (There was one award with three white men and one Black man as PIs and, well, does anyone get away with a four PI list that includes no woman anymore?) Anyway… make of that what you will.

I suspect that this RFA outcome is probably slightly better than the usual? And that if you looked at NIH’s studies that deal with communities or color and/or their health concerns more generally it would be even more skewed towards white PIs?

Ginther et al 2011 reported 69.9% of apps in their sample had white PIs, 16.2% had Asian PIs and 1.4% had Black PIs. Hoppe et al 2019 reported (Table S1) 1.5% of applications had Black PIs and 65.7% had white PIs in their original sample. So the 11 out of 17 grants having white PIs/Contact MultiPIs matches expected distribution, as does 3 Asian PIs. Black PIs are over represented since 1-2% of 17 is..zero grants funded. So this was not an opportunity that NIH took to redress the Ginther Gap.

But should it be? What should be the identity of PIs funded to work on issues related to “racism and discrimination” as it applies to “minority health and health disparities”? The “best” as determined by a study section of peer scientists, regardless of applicant characteristics? Regardless of the by now very well established bias against applications with Black PIs?

Someone on twitter asked about the panel that reviewed these grants. You can see from the funded grants on RePORTER that the study section reviewing these proposals was ZMD1 KNL (J1). Do a little web searching and you find that the roster for the 11/15/2021-11/17/2021 meeting is available. A three day meeting. That must have been painful. There are four chairs and a huge roster listed. I’m not going to search out all of them to figure out how many were white on the review panel. I will note that three of the four chairs were white and one was Asian (three of four were MDs, one was a PHD). This is a good place for a reminder that Hoppe et al reported 2.4% of reviewers were Black and 77.8% white in the study sections reviewing proposals for funding in FY2011-2015. I would be surprised if this study section was anything other than majority white.

NIDA, NIMH, and NINDS have issued a Program Announcement (PAR-22-181) to provide Research Opportunities for New and “At-Risk” Investigators with the intent to Promote Workforce Diversity.

This is issued as a PAR, which is presumably to allow Special Emphasis Panels to be convened. It is not a PAS, however, the announcement includes set-aside funding language familiar to PAS and RFA Funding Opportunity Announcements (FOA).

Funds Available and Anticipated Number of Awards The following NIH components intend to commit the following amounts for the duration of this PAR: NINDS intends to commit up to $10 million per fiscal year, approximately 25 awards, dependent on award amounts; NIDA intends to commit up to $5 million per fiscal year, 12-15 awards, dependent on award amounts; NIMH intends to commit up to $5 million per fiscal year, 12-15 awards, dependent on award amounts; Future year amounts will depend on annual appropriations.

This is a PA typical 3 year FOA which expires June 7, 2025. Reciept dates are one month ahead of standard, i.e., Sept (new R01) / Oct (Resub, Rev, Renew); Jan/Feb; May/Jun for the respective Cycles.

Eligibility is in the standard categories of concern including A) Underrepresented Racial/Ethnic groups, B) Disability, C) economic disadvantage and D) women. Topics of proposal have to be within the usual scope of the participating ICs. Eligibility of PIs is for the familiar New Investigators (“has not competed successfully for substantial, NIH (sic) independent funding from NIH“) and a relatively new “at risk” category.

At risk is defined as “has had prior support as a Principal Investigator on a substantial independent research award and, unless successful in securing a substantial research grant award in the current fiscal year, will have no substantial research grant funding in the following fiscal year.

So. We have an offset deadline (at least for new proposals), set aside funds, SEPs for review and inclusion of NI (instead of merely ESI) and the potential for the more experienced investigator who is out of funding to get help as well. Pretty good! Thumbs up. Can’t wait to see other ICs jump on board this one.

To answer your first question, no, I have no idea how this differs from the NINDS/NIDA/NIAAA NOSI debacle. As a reminder:

Notice NOT-NS-21-049 Notice of Special Interest (NOSI): NIH Research Project Grant (R01) Applications from Individuals from Diverse Backgrounds, Including Under-Represented Minorities was released on May 3, 2021.

The “debacle” part is that right after NIDA and NIAAA joined NINDS in this NOSI, the Office of the Director put it about that no more ICs could join in and forced a rescinding of the NOSI on October 25, 2021 while claiming that their standard issue statement on diversity accomplished the same goals.

I see nothing in this new PAR that addresses either of the two real reasons that may have prompted the Office of the Director to rescind the original NOSI. The first and most likely reason is NIH’s fear of right wing, anti-affirmative action, pro-white supremacy forces in Congress attacking them. The second reason would be people in high places* in the NIH that are themselves right wing, anti-affirmative action and pro-white supremacy. If anything, the NOSI was much less triggering since it came with no specific plans of action or guarantees of funding. The PAR, with the notification of intended awards, is much more specific and would seemingly be even more offensive to right wingers.

I do have two concerns with this approach, as much as I like the idea.

First, URM-only opportunities have a tendency to put minority applicants in competition with each other. Conceptually, suppose there is an excellent URM qualified proposal that gets really high priority scores from study section and presume it would have also done so in an open, representation-blind study section. This one now displaces another URM proposal in the special call and *fails to displace* a lesser proposal from (statistically probable) a majoritarian PI. That’s less good than fixing the bias in the first place so that all open competitions are actually open and fair. I mentioned this before:

These special FOA have the tendency to put all the URM in competition with each other. This is true whether they would be competitive against the biased review of the regular FOA or, more subtly, whether they would be competitive for funding in a regular FOA review that had been made bias-free(r). […] The extreme example here is the highly competitive K99 application from a URM postdoc. If it goes in to the regular competition, it is so good that it wins an award and displaces, statistically, a less-meritorious one that happens to have a white PI. If it goes in to the MOSAIC competition, it also gets selected, but in this case by displacing a less-meritorious one that happens to have a URM PI. Guaranteed.

The second concern is one I’ve also described before.

In a news piece by Jocelyn Kaiser, the prior NIH Director Elias Zerhouni was quoted saying that study sections responded to his 2006/2007 ESI push by “punishing the young investigators with bad scores”. As I have tried to explain numerous times, phrasing this as a matter of malign intent on the part of study section members is a mistake. While it may be true that many reviewers opposed the idea that ESI applicants should get special breaks, adjusting scores to keep the ESI application at the same chances as before Zerhouni’s policies took effect is just a special case of a more general phenomenon.

So, while this PAR is a great tactical act, we must be very vigilant for the strategic, long term concerns. It seems to me very unlikely that there will be enthusiasm for enshrining this approach for decades (forever?) like the ESI breaks on merit scores/percentiles/paylines. And this approach means it will not be applied by default to all qualifying applications, as is the case for ESI.

Then we get to the Oppression Olympics, an unfortunate pitting of the crabs in the barrel against each other. The A-D categories of under-representation and diversity span quite a range of PIs. People in each category, or those who are concerned about specific categories, are going to have different views on who should be prioritized. As you are well aware, Dear Reader, my primary concern is with the Ginther gap. As you are aware, the “antis” and some pro-diversity types are very concerned to establish that a specific person who identifies as African-American has been discriminated against and is vewwwwy angweee to see any help being extended to anyone of apparent socio-economic privileges who just so happens to be Black. Such as the Obama daughters. None of us are clean on this. Take Category C. I have relatively recently realized that I qualify under Category C since I tick three of the elements, only two are required. I do not think that there is any possible way that my qualification on these three items affects my grant success in the least. To do so would require a lot of supposing and handwaving. I don’t personally think that anyone like me who qualifies technically under Category C really should be prioritized against, say, the demonstrated issue with the Ginther gap. These are but examples of the sort of “who is most disadvantaged and therefore most deserving” disagreement that I think may be a problem for this approach.

Why? Because reviewers will know that this is the FOA they are reviewing under. Opinions on the relative representation of categories A-D, Oppression Olympics and the pernicious stanning of “two-fers” will be front and present. Probably explicit in some reviews. And I think this is a problem in the broader goals of improving equity of opportunity and in playing for robust retention of individuals in the NIH funded research game.

__

*This is going to have really ugly implications for the prior head of the NIH, Francis Collins, if the PAR is not rescinded from the top and the only obvious difference here is his departure from NIH.

In a prior post, A pants leg can only accommodate so many Jack Russells, I had elucidated my affection for applying Vince Lombardi’s advice to science careers.

Run to Daylight.

Seek out ways to decrease the competition, not to increase it, if you want to have an easier career path in academic science. Take your considerable skills to a place where they are not just expected value, but represent near miraculous advance. This can be in topic, in geography, in institution type or in any other dimension. Work in an area where there are fewer of you.

This came up today in a discussion of “scooping” and whether it is more or less your own fault if you are continually scooped, scientifically speaking.

He’s not wrong. I, obviously, was talking a similar line in that prior post. It is advisable, in a career environment where things like independence, creativity, discovery, novelty and the like are valued, for you NOT to work on topics that lots and lots of other people are working on. In the extreme, if you are the only one working on some topic that others who sit in evaluation of you see as valuable, this is awesome! You are doing highly novel and creative stuff.

The trouble is, that despite the conceits in study section review, the NIH system does NOT tend to reward investigators who are highly novel solo artists. It is seemingly obligatory for Nobel Laureates to complain about how some study section panel or other passed on their grant which described the plans to pursue what became the Nobel-worthy work. Year after year a lot of me-too grants get funded while genuinely new stuff flounders. The NIH has a whole system (RFAs, PAs, now NOSI) set up to beg investigators to submit proposals on topics that are seemingly important but nobody can get fundable scores to work on.

In 2019 the Hoppe et al. study put a finer and more quantitatively backed point on this. One of the main messages was the degree to which grant proposals on some topics had a higher success rate and some on other topics had lower success rates. You can focus on the trees if you want, but the forest is all-critical. This has pointed a spotlight on what I have taken to calling the inherent structural conservatism of NIH grant review. The peers are making entirely subjective decisions, particularly right at the might-fund/might-not-fund threshold of scoring, based on gut feelings. Those peers are selected from the ranks of the already-successful when it comes to getting grants. Their subjective judgments, therefore, tend to reinforce the prior subjective judgments. And of course, tend to reinforce an orthodoxy at any present time.

NIH grant review has many pseudo-objective components to it which do play into the peer review outcome. There is a sense of fair-play, sauce for the goose logic which can come into effect. Seemingly objective evaluative comments are often used selectively to shore up subjective, Gestalt reviewer opinions, but this is in part because doing so has credibility when an assigned reviewer is trying to convince the other panelists of their judgment. One of these areas of seemingly objective evaluation is the PI’s scientific productivity, impact and influence, which often touches on publication metrics. Directly or indirectly. Descriptions of productivity of the investigator. Evidence of the “impact” of the journals they publish in. The resulting impact on the field. Citations of key papers….yeah it happens.

Consideration of the Hoppe results, the Lauer et al. (2021) description of the NIH “funding ecology” in the light of some of the original Ginther et al. (2011, 2018) investigation into the relationship of PI publication metrics is relevant here.

Publication metrics are a function of funding. The number of publications a lab generates depend on having grant support. More papers is generally considered better, fewer papers worse. More funding means an investigator has the freedom to make papers meatier. Bigger in scope or deeper in converging evidence. More papers means, at the very least, a trickle of self-cites to those papers. More funding means more collaborations with other labs…which leads to them citing both of you at once. More funding means more trainees who write papers, write reviews (great for h-index and total cites) and eventually go off to start their own publication records…and cite their trainee papers with the PI.

So when the NIH-generated publications say that publication metrics “explain” a gap in application success rates, they are wrong. They use this language, generally, in a way that says Black PIs (the topic of most of the reports, but this generalizes) have inferior publication metrics so this causes a lower success rate. With the further implication that this is a justified outcome. This totally ignores the inherent circularity of grant funding and publication measures of awesomeness. Donna Gither has written a recent reflection on her work on NIH grant funding disparity, which doubles down on her lack of understanding on this issue.

Publication metrics are also a function of funding to the related sub-field. If a lot of people are working on the same topic, they tend to generate a lot of publications with a lot of available citations. Citations which buoy up the metrics of investigators who happen to work in those fields. Did you know, my biomedical friends, that a JIF of 1.0 is awesome in some fields of science? This is where the Hoppe and Lauer papers are critical. They show that not all fields get the same amount of NIH funding, and do not get that funding as easily. This affects the available pool of citations. It affects the JIF of journals in those fields. It affects the competition for limited space in the “best” journals. It affects the perceived authority of some individuals in the field to prosecute their personal opinions about the “most impactful” science.

That funding to a sub-field, or to certain approaches (technical, theoretical, model, etc, etc) has a very broad and lasting impact on what is funded, what is viewed as the best science, etc.

So is it good advice to “Run to daylight”? If you are getting “scooped” on the regular is it your fault for wanting to work in a crowded subfield?

It really isn’t. I wish it were so but it is bad advice.

Better advice is to work in areas that are well populated and well-funded, using methods and approaches and theoretical structures that everyone else prefers and bray as hard as you can that your tiny incremental twist is “novel”.