The Endgame

June 16, 2020

Hoppe et al 2019 reported that R01 applications submitted by Black PIs for possible funding in Fiscal Years 2011-2015 were awarded at a rate of 10.7%. At the same time, R01 applications submitted by white PIs enjoyed an award rate of 17.7%.

There were 2,403 R01 applications submitted by Black PIs and 18,315 applications submitted by white PIs that were funded.

If you take the unawarded applications submitted by Black PIs (2,147 of them) and swap these out for applications funded to white PIs this would reduce the funding rate of the applications submitted by white PIs to.


Which is still 46% higher than the award rate the applications from Black PIs actually achieved.

I want you to really think deeply about fairness.

The NIH rules on empaneling reviewers on study sections says right at the top under General Requirements:

There must be diversity with respect to the geographic distribution, gender, race, and ethnicity of the membership.

You will notice that it does not specify any specific diversity targets. One handy older report that I had long ago, lost and then found again is called the CSR Data Book, FY 2004 and it is dated 5/23/2005. Among other details, Table 16 shows that the from 2000-2004 the percent of female reviewers appointed to panels went 27.0%, 25.8%, 28.2%, 31.1%, 32.9%. The percent of non-standing (ad hocs and SEP participation) went 24.5%, 25.7%, 25.2%, 24.4%, 24.9%. That’s good enough for now, feel free to chase down any more recent stats, I’m sure they are in the NIH site somewhere.

My dumb little twitter poll showed that 35.3% of people that had an opinion thought that the NIH’s apparent female reviewer target was about right. I assert that they probably arrive at their target based on what they think is the fraction of their target population (STEM profs? Biomed profs? NIH applicants?). Who knows but I bet whatever it is, it is below the population representation. Some 59.9% of those that offered an opinion thought that the ~population target was about right.

It isn’t in that older document, but Hoppe et al do report in Table S10 that 2.4% of reviewers for all study sections that evaluated R01s were African-American while 77.8% were white. As a reminder, about 14% of Americans are Black if you include those that check other boxes as multi-racial, 12.4% if you do not.

We can see from this that of the responses offered, 12.7% thought there should be fewer Black reviewers than their are (or roughly the same), some 19% thought it should be about the proportion of Black Professors in STEM fields and 68.3% thought it should more or less match the population level.

There is a serious disconnect between the opinion of the dumb little twitter poll of those that follow me on Twitter and what CSR is targeting as being “diverse with respect to…gender, race“.

Now, admittedly I have been preparing the field of battle for two weeks at this point, years by some reckonings. Softening them up. Carpet bombing with Ginther napalm and Hoppe munitions. So this is by no means a random sample. This is a sample groomed to be at least aware of NIH funding disparity and a sample subjected to an awful lot of my viewpoint that this is a massive failure of the NIH that needs to be corrected.

But still, I think some direct questions are in order. So next time you are talking to your favorite SRO, maybe ask them about this.

See if you can get them to admit to the targets that are discussed inside CSR.

Offer your own opinion on what target they should be using.

One interesting little point. I posted these polls only an hour apart and flogged both of them a couple of times later in the day. I actually pinned the second one which should give it slightly more visibility, if anything.

405 people offered an opinion on the question about African-American reviewers and 689 on the second one. The gender one got 4 RTs (which might boost reach) and the racial one got 2. The “no opinion” vote was 98 for the racial question and 107 for the gender poll so apparently the looky-loo portion of the samples is ~the same number of people.

I find this to be pertinent to the miasma of institutional injustice that we are discussing of late.

I’ve been tweeting a lot of stuff lately that is related to Ginther et al., 2011, Ginther et al., 2018 and most especially the Hoppe et al. 2019 publication. This has been somewhat related to the national conversation we’re having about racial disparity in the wake of the white woman dogwalker attempted murder by cop, the George Floyd murder by cop and the ensuing peaceful protests and cop counter-protest violence that ensued.

I’ve had quite a bit to say about the original Ginther and the dismal NIH response to it. I was particularly unhappy with the NIH (ok, Director Francis Collin’s) response to the Hoppe et al paper.

These papers and findings are tops on my mind, especially as I fielded reactions both direct and indirect from my colleagues. Everybody is really dismayed by the George Floyd murder. Everybody is taking a moment, maybe because they are home with the Corona virus restrictions, but taking a moment to be be really bothered. And really keen to UNDERSTAND. And to DO something. Well, doing things kinda starts in our own house, eh?

The Hoppe paper is mostly about topic words and the way that the types of research interests that Black PIs have may set them at a disadvantage. Nevermind the fact that even within topic word clusters Black PIs still are at a disadvantage, the NIH is really keen to discuss the glass being half not-racist instead of the fact it’s also still half racist. But for me this was an opportunity to grapple with the numbers and revisit my old topics about how few grants it would actually take to even up the hit rate for Black PIs. This is because Black applicants are only in the low single digits in terms of percentages. The Hoppe data looks at R01 applications submitted for FY2011-2015, taking only the ones with identified Black or white PIs. We’re going jump into the middle a bit here so that I can download my recent tweet storm into a post. First, a poll I put up.

The question came from my thinking about Hoppe, but I waited to see the votes before returning to a theme I’d been on before. It will help you to open both Hoppe and the Supplement and look at Figure 1 from the former and Table S1 of the latter. Figure 1 confuses applicants (left side) with applications (top right) so it can be good to refer to the Table S1.

There were 2403 R01 applications from Black PIs. 1346 (or 56%) were triaged and 1057 (44%) were discussed. Of the discussed applications 256 (10.7%) were funded and 801 of the discussed apps were not funded. (Note there’s some rounding error here so don’t hold me to one app one way or the other. That 10.7% was rounded up because 10.7% of 2403 is 257, not 256.) This was for applications submitted across five Fiscal Years, so we’re talking ~269 apps triaged (not discussed) per year and ~160 discussed but not funded per year. There are 25 NIH ICs that fund grants, if I have it right. (I’m pulling the relative allocation per-IC below from a spreadsheet that lists 25.)

So that’s 11 (triaged) and 6 (discussed) Black PI applications per year per IC that do not get funded. For reference, NIMH (which is the 9th biggest IC by budget) has 256 new R01 and 37 Type 2 renewal R01s on the books right now. That’s right, you say, ICs are different in size and so therefore we need to adjust the unfunded applications from Black PIs to the size of the IC. Yes, I realize we probably have large differences in % Black PIs seeking funding across the ICs but it’s all we have to go on without better information. ok, so lets look at the unfunded apps by IC share. This analysis to follow will be selected ICs.

The biggest NIH institute, NCI, receives 15.5% of the entire NIH allocation (which is $41.64 Billion). If we allocate the unfunded applications from Black PIs proportionally then NCI applications account for 42 NDs and 25 discussed-unfunded. But that institute is so large it is hard to really grasp. Lets look at NIGMS (5th by $)- 19 NDs and 11 unfunded. MH? 13/8; DA? 10/6; AA? 4/2. and I’m rounding up for the last two ICs. so. what percentage of their funded (type 1, type 2) would this be? I’m basing off current FY Type 1 and 2 because we’re talking forward policy. If these ICs picked up the discussed-not-funded by %NIH$ share? NIGMS- 2%, NIMH- 2.7%, NIDA- 5.2%, NIAAA- 2.5%.

For completeness the share of the triaged/ND apps would be: NIGMS- 3.3%, NIMH- 4.5%, NIDA- 8.7%, NIAAA- 4.2%. again as a fraction of their current new grants. I mention this because one of the consistent findings of Gither et al 2011 and Hoppe et al. 2019 is that applications from Black PIs are more likely to be triaged. The difference in the Hoppe data set was 56% of applications from Black PIs went un-discussed versus only 42.6% of white PI applications.

So. Those numbers of discussed-but-unfunded applications from Black PIs are low, but it seems high enough to be relevant. A couple to five percentage of the portfolio for a year? This is not unimportant to the IC portfolio. But to YOU, my friend… remember the population size. If we took those 801 apps from the Hoppe data set and funded them, while subtracting 801 apps funded to white PIs (remember, they ignored all other categories of PI race), this would make the success rate for white PI applications go from 17.7% to…wait for it…16.9%. Recall, the funding rate for Black PI applications was 10.7%. So yes, that would push the success rate for Black PI applications to 44%ile if NIH funded all of the discussed applications. Which sounds totally unfair. But before you get too amped about that, recall your history.

Those people we think of as the current luminaries spend a good chunk of the middle of their careers enjoying >30% success. Look at those rates in the 1980s…you may not be aware of this but the early 80s was time remembered as simply terrible in the grant getting. Oh, the older folks would tell me tales of their woes even in the mid 2000s. Well I eventually realized why. Some of them had a few years in there, prior to the 1980s, of 40% or better. And this particular data set (it’s RPG, not just R01 btw) isn’t even broken out by established/new PI or continuation/new grant! So I’m sure the hit rate for established PI applications was higher as was the rate for competing renewal applications.

Why yes, we ARE coming back to the establishment of generational accumulated wealth. From a certain point of view. but not right now. we’re not ready to talk about the R word.

Instead, let’s come at this the other way. We kinda got into this a few days ago talking about the white PI grants that were funded at lower scores than *any* funded app with a Black PI (this is in Table 1 of Hoppe et al). There were 2403 Black PI applications in the dataset used in Hoppe et al.. 17.7% of this is 425. Subtract the 256 that were funded and we are at 169 applications (as a reminder this is NIH wide, over 5 years) to reach parity with the white PI rate. Of course subtracting those 169 from the white PI pool would plunge their success. *plunge* I tell you.

From 17.7% to…..17.5%. which would obviously be totally unfair so I’ll let you do the math to get them to meet in the middle. Just remember NIH prefers if the Black PI apps are juuuuuust under. Statistically indistinguishable tho. Like for gender. Getting this to meet in the middle means that something less than a 0.2% change in the success rate of grants submitted by white PIs would fix the 7.0% deficit in success rates that applications from Black PIs suffer.

If instead of just matching success rate, NIH were to fund every single discussed application submitted by Black PIs, this would only change white PI success rates by 0.8%, down from 17.7% to 16.9% as outlind above. Again, we need to compare that 0.8% drop to the 7% deficit suffered by applications with a Black PIs that is currently NBD according to the NIH. and many of our science peers.

I feel confident there are many who are contemplating these analyses and the implied questions thinking “wait, I’m not exchanging my grant for their grant“. But that’s not the right way to think about this. You would be exchanging your current 17.7% success rate for a 17.5% success rate.

I was just noticing something that I hadn’t really focused on before in the Hoppe et al 2019 report on the success of grant applications based on topic choices. This is on me because I’d done an entire blog post on a similar feature of this situation back when Ginther et al 2011 emerged. The earlier blog post focused on the quite well established reality that almost all apps are funded up to a payline (or virtual payline for ICs that claim, disingenuously, that they don’t have one) and that the odds of being funded as one moves away from (worse scoring) that payline, the lower the odds. Supplemental Figure S1 in Ginther showed that these general trends were true for all racial groups.

My blog post was essentially focused on the idea that some apps from African-American PIs were not being funded at a given near-miss score while some apps from white PIs were being funded at worse scores.

It’s worth taking a look at this in Hoppe et al. because it is a more recent dataset from applications scored using the new nine point scale.

I was alerted to Table 1 of Hoppe et al. which shows the percentage of the total funded pool of applications from Black and white PIs by the voted percentile rank, binned into 5 percentile ranges (0-4 is good, 85-89%ile bad).

As you would expect, almost all applications in the top two bins (0-9%ile) were funded regardless of PI race. And the chances of an app being funded at a given percentile bin decrease the further they are away from the very top scores. Where it gets interesting is after the 34%ile mark where no Black PI apps were funded. In any score bin. And there was at least one application in each bin save for 65-69, 75-79 and 80-84 which are not worth talking about anyway.

The pinch is observing that at least some applications of white PIs were funded from 35-59th percentile. I.e., at scores that are worse than the score of any app funded with a Black PI. On Twitter I originally screwed up the count because I stupidly applied the bin percentages to the entire population of funded awards. Not so. In fact I need to calculate it per bin.

Now if my current thinking is right, and it may not be, those bonus bins for white PIs represent 25% of the distribution (5 bins, 5%ile points per bin). The supplement Table S1 tells us there were 103,620 applications submitted by white PIs so that leaves us with 25,905 applications, 5,181 in each bin.

This is very rough.

Percentiling of applications is within a rolling three rounds of each standing study section. Special Emphasis Panels are variously percentiled- sometimes against an associated parent study section, sometimes against the total CSR pool.

But let’s take this as the aggregate for discussion.

Multiplying each of the bin success rates, I end up with a total of 119 applications of white PIs funded from 34-59th percentile. A score range at which ZERO applications were funded to Black PIs.

So, in essence, you could replace all of those applications funded to white PIs with more meritorious (well? that’s how they use the rankings. percentile = merit) unfunded applications submitted by Black PIs. Even by some distance as only 74% of 10-14%ile scoring applications with Black PIs were funded for example.

I was curious why Hoppe et al included the Table and what use they made of it. I could find only one mention of Table 1 and it was in the section titled “IC decisions do not contribute to funding gap“.

However, below the 15th percentile, there was no difference in the average rate at which ICs funded each group (Table 1); applications from AA/B and WH scientists that scored in the 15th to 24th percentile range, which was just above the nominal payline for FY 2011–2015, were funded at similar rates (AA/B 25.2% versus WH 26.6%, P = 0.76; Table 1). The differences we observe at narrower percentile ranges (15 to 19, 20 to 24, 25 to 29, and 30 to 34) slightly favored either AA/B or WH applicants alternately but were in no case statistically significant (P ≥ 0.13 for all ranges). These results suggest that final funding decisions by ICs, whether based on impact scores or discretionary funding decisions, do not contribute to the funding gap.

This is more than a little annoying. Sure, they sliced and diced the analysis down to where it is not statistically resolvable as a difference. But real world? It’s not a matter of constant anger for any PI who has a near miss score and gets wind of anyone being funded at a worse score? Sure it is.

And that last statement is just plain false. 119 white PI applications funded at worse scores is 46.5% of the total number of applications funded with Black PIs. If all of those discretionary funding decisions had gone to Black PIs, that would raise the hit rate from 10.7% to 15.6% for Black PIs. Whereas the white PI hit rate would plunge from 17.7% to…17.56%.

So this analysis they are referring to supports quite the opposite conclusion. Discretionary funding decisions, i.e. outside of percentile ranks where nearly every application is funded, do in fact contribute substantially to the disparity.

and correcting this to give Black PIs a fair hit rate, by selecting applications of HIGHER MERIT, would cause an entirely imperceptible change in the chances for white PIs.