More thoughts on the dismal NIH response to Ginther

January 15, 2014

Jeremy Berg made a comment

If you look at the data in the Ginther report, the biggest difference for African-American applicants is the percentage of “not discussed” applications. For African-Americans, 691/1149 =60.0% of the applications were not discussed whereas for Whites, 23,437/58,124 =40% were not discussed (see supplementary material to the paper). The actual funding curves (funding probability as a function of priority score) are quite similar (Supplementary Figure S1). If applications are not discussed, program has very little ability to make a case for funding, even if this were to be deemed good policy.

that irritated me because it sounds like yet another version of the feigned-helpless response of the NIH on this topic. It also made me take a look at some numbers and bench race my proposal that the NIH should, right away, simply pick up enough applications from African American PIs to equalize success rates. Just as they have so clearly done, historically, for Early Stage Investigators and very likely done for woman PIs.

Here’s the S1 figure from Ginther et al, 2011:
Ginther-S1

[In the below analysis I am eyeballing the probabilities for illustration’s sake. If I’m off by a point or two this is immaterial to the the overall thrust of the argument.]

My knee jerk response to Berg’s comment is that there are plenty of African-American PI’s applications available for pickup. As in, far more than would be required to make up the aggregate success rate discrepancy (which was about 10% in award probability). So talking about the triage rate is a distraction (but see below for more on that).

There is a risk here of falling into the Privilege-Thinking, i.e. that we cannot possible countenance any redress of discrimination that, gasp, puts the previously underrepresented group above the well represented groups even by the smallest smidge. But looking at Supplementary Fig1 from Gither, and keeping in mind that the African American PI application number is only 2% of the White applications, we can figure out that a substantial effect on African American PI’s award probability would cause only an imperceptible change in that for White PI applications. And there’s an amazing sweetener….merit.

Looking at the award probability graph from S1 of Ginther, we note that there are some 15% of the African-American PI’s grants scoring in the 175 bin (old scoring method, youngsters) that were not funded. About 55-56% of all ethnic/racial category grants in the next higher (worse) scoring bin were funded. So if Program picks up more of the better scoring applications from African American PIs (175 bin) at the expense of the worse scoring applications of White PIs (200 bin), we have actually ENHANCED MERIT of the total population of funded grants. Right? Win/Win.

So if we were to follow my suggestion, what would be the relative impact? Well thanks to the 2% ratio of African-American to White PI apps, it works like this:

Take the 175 scoring bin in which about 88% of white PIs and 85% of AA PIs were successful. Take a round number of 1,000 apps in that scoring bin (for didactic purposes, also ignoring the other ethnicities) and you get a 980/20 White/African-AmericanPI ratio of apps. In that 175 bin we’d need 3 more African-American PI apps funded to get to 100%. In the next higher (worse) scoring bin (200 score), about 56% of White PI apps were funded. Taking three from this bin and awarding three more AA PI awards in the next better scoring bin would plunge the White PI award probability from 56% to 55.7%. Whoa, belt up cowboy.

Moving down the curve with the same logic, we find in the 200 score bin that there are about 9 AA PI applications needed to put the 200 score bin to 100%. Looking down to the next worse scoring bin (225) and pulling these 9 apps from white PIs we end up changing the award probability for these apps from 22% to ..wait for it….. 20.8%.

And so on.

(And actually, the percentage changes would be smaller in reality because there is typically not a flat distribution across these bins and there are very likely more applications in each worse-scoring bin compared to the next better-scoring bin. I assumed 1,000 in each bin for my example.)

Another way to look at this issue is to take Berg’s triage numbers from above. To move to 40% triage rate for the African-AmericanPI applications, we need to shift 20% (230 applications) into the discussed pile. This represents a whopping 0.4% of the White PI apps being shifted onto the triage pile to keep the numbers discussed the same.

These are entirely trivial numbers in terms of the “hit” to the chances of White PIs and yet you could easily equalize the success rate or award probability for African-American PIs.

It is even more astounding that this could be done by picking up African-American PI applications that scored better than the White PI applications that would go unfunded to make up the difference.

Tell me how this is not a no-brainer for the NIH?

34 Responses to “More thoughts on the dismal NIH response to Ginther”

  1. Jeremy Berg Says:

    DM: Your analysis is correct and this approach would correct the success rate disparity. The point of my comment was not to apologize or feign helplessness on behalf of my former employer. Rather it was to point out an important aspect of the Ginther study that is a potentially important clue with regard to the causes of the disparity. The 60% versus 40% disparity in triage rates needs to be explained (through bias or worse applications or a combination of the two). As you demonstrate, the funding curves are nearly identical, indicating that, in aggregate, applications from African Americans and Whites have similar award probabilities with the same score. However, given NIH’s stated and real need for more diversity in the scientific workforce, considering the impact on racial diversity as a factor when deciding between funding of two applications of indistinguishable scientific merit (or, as you point out, perhaps even more scientific merit) would seem defensible. Indeed, once must recall that these curves represent the aggregated behavior of all ICs over a fairly long period of time so that this may be occurring, but not enough to shift the curve. Recall that no one had looked at these data until the Ginther study. This is due, in part, to the noise in the data due to the small numbers of African Americans in the system.

    Like

  2. drugmonkey Says:

    Your point about triage has another highly important practical implication. One of the important concepts of grant review is that the burden of triage must not fall disproportionally on X applications.

    Where X is 1) early stage / New investigator applications and 2) non-R01 mechanisms.

    We can argue the functional significance* of discussing ESI or R21 apps where the preliminary scores are over what would be the triage line for an R01 but the point is highly relevant to applications from African-American PIs. By what justice is there this special triage protection for ESI apps (were there data? presumably) but not for African-American PI apps?

    *dubious**

    **still, a discussed app, no matter how poor the score, is more likely to be picked up than one that is triaged.

    Like

  3. Jeremy Berg Says:

    With regard to ESIs and New Investigators, there were data examined supporting the higher triage rates before any of the policies about discussion order or equalizing success rates were implemented. Furthermore, there were data that showed that when some ICs publicly stated that they were giving New Investigators numerical benefits (different payline) then study sections, in aggregate, gave New Investigators worse scores.

    I am not trying to be defensive here, but there are considerable logistical and legal differences between ESIs and racial groups. As you know, demographic data include race self identification is voluntarily collected, separate from the applications themselves. Relinking these data was allowed under the terms of the contract that led to the Ginther study but, in general, the racial identity of the applicant must be deduced indirectly with all that thay entails. In addition, racial preferences in government grants and contracts is subject to legal precedents such as the 1995 Adarand vs Pena Supreme Court case that place limits of programs and processes. Finally, the potential to do harm via stereotype threat or other mechanisms needs to be considered with any new policy. Again, I am not trying overly apologetic. It is quite disappointing to me that NIH has been so slow to try to understand the potential bias in the system, particularly after such a public display of commitment to this goal.

    Like

  4. sciencedude Says:

    I love it DM. I’m checking the AA box as I write. I am still looking for the star-bellied Sneetche box. Or…….perhaps the NIH could award grants based simply on priority score without jumping between bins. Merit is enhanced and no one is punished for applying while white. Win/win. 230 is a small number unless you are one of them.

    Like

  5. drugmonkey Says:

    JB- I apologize for the extent to which my frustrations seem like I am singling you out as the problem. I think you are one of the good guys in all of this grant business.

    Like

  6. drugmonkey Says:

    “Punished for applying while white”. Interesting. No possibility in your mind of any bias in the merit scoring eh?

    Like

  7. Jeremy Berg Says:

    You have to be careful in interpreting Figure S1 in terms of the behavior of any IC or even NIH overall since it is the aggregate over all ICs over a 7 year period (where success rates changed dramatically from year to year (2000-2006). The apparent funding of grants out of priority score order is greatly enhanced by the time aggregation. Suppose all grants up to 240 were funded in 2000, and all grants up to 150 were funded in 2006 with intermediate paylines for the in-between years. The curve would look like Figure S1 even through not a single application was funded out of order. The different paylines for the ICs would have the same effect.

    Like

  8. Jeremy Berg Says:

    DM-No apology needed. It is important to keep pushing on these issues. They are important but complicated. Without folks asking questions and pushing for and at the data, no insights or actions are likely to ensue.

    Like

  9. Jeremy Berg Says:

    DM: The more I think about this, the more I think you interpretation is Figure S1 is problematic based on the nature of the chart. The facts that the graph is (1) aggregated over years with different years with quite different success rates (as I noted above) and (2) is based on priority scores rather than percentiles means that the fraction funded within any given bin is very difficult to interpret. I believe the graph is consistent with that fact that no applications from White applications were ever funded in preference to African American applicants with better percentile scores. I am not saying that this did or did not occur, but rather than the Figure does not bear on this issue, despite its appearance.

    Like

  10. writedit Says:

    DM: the Ginther study had applications with optional PI personal data forms (from way old PHS 398 packages) on which the PI could mark his or her sex, race/ethnicity, and date of birth (which is how I knew all my PI’s birthdays). Now, to the best of my knowledge, their is no indication on the application of the PI’s race/ethnicity/age. How do current reviewers or extramural staff know which bin to use?

    Like

  11. drugmonkey Says:

    yeah, the old head goes a little deeper in the sand at NIH.

    as far as reviewers go, as I’ve observed repeatedly, they often KNOW who the PI is and if they do not, they quite frequently go Googling to figure it out. I argue, in fact, that this is *good* simply to even out the fact that the more famous a PI is, the more reviewers are going to know her. so if a little judicious Googling makes the reviewer think ‘oh, THAT person I just spoke to at her poster last year’ this is of benefit to the lesser-known applicant.

    For today, the point is that keeping personal characteristic information off the application may let the NIH think their hands are cleaner but it by no means keeps this information out of the heads of the review panel members. And really, when a PO is trying to decide to go out on a limb and make a plea for an out-of-order pickup do you really think he or she doesn’t likewise go googling? of course they do. and they *should* for a similar rationale. They already know all about the experienced investigators who have had grants in that PO’s portfolio before.

    Like

  12. The Other Dave Says:

    DM: You and I both know that when a reviewer needs to go Googling, it’s not good for the PI.

    I agree that the only way to change the status quo might be a little affirmative action. But it needs to be done carefully, lest their be a backlash.

    And anyway, from the data I’ve seen, the biggest racial disparities are pre-NIH proposal. Once underrepresented groups make it to applying, they are almost as successful as SOWD (standard old white dudes). Trouble is, most don’t make it that far.

    How many trainees from underrepresented groups do you have in your lab, DM? How hard do you seek them out?

    Like

  13. Former Technician Says:

    Having recently gone through one of our summary statements, I can completely agree with the comment that the reviewers may know the PI. There were comments about the PI and the team that were not included in this particular application. My PI is WELL known in his field and somewhat well known in the periphery of the field. Part of this is his work and part is his longevity. He is over 70 with a body of over 40 years of publications and work some of which was considered groundbreaking.

    On the other hand, we could also name two of the reviewers based on their comments. The ones who gave the better scores *grin* This proposal had two favorable reviewers and one who didn’t like it. 15th percentile. The payline for established investigators is 6th percentile. The PI has already contacted the Program Officer to try to move things.

    I admire this particular PO, who does not seem to fall at his feet. The recommendation was to resubmit although it did come with an offer of suggestions for changes.

    Like

  14. Professa Says:

    TOD-
    I think you missed the point that triage was 60% for AA vs 40 for whites. So even once an AA pi has made it its still an uphill climb.
    I agree that there needs to be effort made at the trainee level as a long term strategy but short term a little effort from IC would go a long way.
    I must admit difficulty in attracting AA trainees to my lab. I’ve had only one postdoc and one undergrad over seven years. I can make more effort to host more undergrads but honestly the grad student and postdoc AA pool is tiny.

    Like

  15. Erickttr Says:

    I think the triage disparity has to do with structural and institutional biases that are at least a decade in the making before the AA PI even writes an R01, as much as study section biases. AA are less likely to have a big wig phd mentor, therefore be in a lab/institute with fewer resources, publish in lower tier journals, have fewer connections in the ole boys network, have letters of support and collaborators outside the ole boys network. All this gives the perception of lesser merit to Prof Oleboy reviewing the grant. I have had summary statements that said, “he should have had Dr Greybeard at his institution as a collaborator,” as a weakness in the Approach. Did this ole boy not consider that maybe Greybeard is too busy for a peant like me and his salary would unnecessarily eat up my R21 budget and there’s really no scientific reason to include him?

    Like

  16. drugmonkey Says:

    Point being that those are not traditionally thought of as genuine merit concerns and are pointed to by all and sundry non-ole-boy connected as biases. Regardless of approximate PI skin reflectance.

    Like

  17. drugmonkey Says:

    A 50% higher rate of triage really needs more consideration. Stare at that number. Think of *any* small (2%) collection of people in your subfield. That you know personally and professionally*. Think about those individuals facing a 50% higher triage risk than you.

    *it happens that I know the work of a good number of NIH funded AfricanAmerican PIs pretty well in my subfield. When there aren’t that many of them, this ain’t hard. Naturally I don’t know the ones who *haven’t* gotten funding as well.**

    **less likely to see at meetings, on study section, review their papers, etc.

    Like

  18. Erin Jonaitis Says:

    Thanks for continuing to write about this, drugmonkey.
    I admit I don’t actually understand how triage works. Who does the triaging? Is it a different person, or group of people, than the ones who actually do the reviewing & give the score?

    I understand from an argument I saw on Twitter that the idea of blinding reviewers as to PI identity is controversial and perhaps impractical, mostly because of feasibility concerns. But if triaging is done in a separate stage, could that part alone be done blinded? If triaging is mostly about whether the project is valuable enough to spend reviewer time on, feasibility seems like it really shouldn’t be the major concern, and so the value of knowing PI names seems much less. And names very often carry strong racial implications. It seems worth piloting such a process, at least, to explore whether it equalizes the triage rate or, even better, the funding rate.

    Like

  19. drugmonkey Says:

    All grants are reviewed in depth by ~3 people prior to the meeting. The assigned reviewers give a preliminary score. The average preliminary score is used to rank the proposals across the entire panel prior to the meeting. Approximately half are designated for discussion, the remaining ones are what we refer to as triaged. Formally “Not Discussed”.

    There are some additional niceties such as any panel member not in conflict of interest can request an application designated for triage be discussed.

    Like

  20. Joe Says:

    The triage line is recommended and decided on at the beginning of the meeting. The SRO will usually point out where the 50% line is, but often you can’t evenly divide the group since a number of applications may have the same initial score. The chair and the members discuss where they would like to get to in the list. As you get near the end of the list on day two, you may revise how far you want to go. Arguing over applications that are in the 3-4 score range is exhausting and depressing, so apart from the one gem you may find that was misunderstood and will get a much better score, mostly these applications are not going to get fundable scores and you just want to be done.
    If you knew which applications were from AA PIs, you could ask that they be pulled up for discussion. Maybe study section chairs should be told who the AA PIs are, and they could request those applications be reviewed.

    Like

  21. drugmonkey Says:

    This is my point about triage burden not falling on ESI or nonR01 mechs…

    Like

  22. qaz Says:

    Concentrating on triage is a red herring. In my study section experience, triage is just a reflection of poor scores. If we really want to address this point, we need to be comparing scores. (Either by determining why minority scores are worse than non or by finding a way for program to fund minority applications at poorer scores than non [as they do for youth].) I don’t think discussing normally triaged minority applications is going to make any difference at all.

    A lot of the discussion above has been based on the hypothesis that the disparity is due to pedigree and networking. If this is true, then minority applications should track similarly positioned non-minority applications. This opens a number of interesting questions. (I have no idea if this data is available.) Do minority applications fare more poorly than non-minority applications from the same institutions? Do minority applications fare more poorly than non-minority applications with similar pedigrees?

    Like

  23. Joe Says:

    @qaz “Concentrating on triage is a red herring.”
    I was under the impression from the discussion above that triage was a big part of the problem. Earlier discussions on this site concluded that it might not even be worthwhile to resubmit a triaged application. If more of the AA PIs could not get the black mark of a triage and could get a full discussion of the application and a score, they might do better on resubmission. Also, this is something that is do-able, even if it would have to be done quietly.

    Like

  24. Joe Says:

    Also, is it the case that a PO could not or would not try to get a pick-up for a triaged application but could or would try if the application was scored?

    Like

  25. DJMH Says:

    I think it’s interesting that a major difference is in triage rates because those reflect, in essence, “private” estimations of quality (ie reviewers read the apps on their own and scores are not discussed) whereas scored grants are “publicly” discussed…and the disparity seems to vanish. Does this mean that people are more likely to allow bias to affect them in private? Seems plausible…

    Like

  26. qaz Says:

    My understanding is that program cannot pick up a triaged application. So, in that sense, getting a minority application out of triage can help it because a PO could pick it up “out of order”. (Which might be the simplest solution.)

    Also, as I noted in my comments on the triage discussions, my experiences on study section do not corroborate the idea that triage is a black mark that hurts resubmission.

    It is true that NI/ESI applications get discussed separately, which means they have their own triage line and their own scoring ranks. (In practice, they are supposed to be “scored on the same scale as non-NI/ESI”, but treating them separately makes that hard. Whether that helps or hurts NI/ESI is a data question.) It is certainly doable to say that we should always discuss all minority applications, but my point is that I don’t think it would help. [In fact, that’s something NIH doesn’t have to do – individual members of study section could make that happen. I’m not suggesting it, just noting it. Most study sections are very negative about pulling grants out of triage unless there’s a good likelihood of getting a “fundable score”.]

    And I would really like to know whether the low funding rate of minority applicants is fully explicable as lack of pedigree and history or whether it is something else. If it is history and pedigree, then the question is (a) how to fix pedigree pipelines and (b) how to help minorities catch up to pedigree. (How well do minorities who have been through specialized programs like SFN’s NSP program do?) If it is something else, then other fixes will be necessary.

    Like

  27. drugmonkey Says:

    First of all, everyone go read the Ginther report and the supplement. There were many covariate type analyses conducted which ruled out several of the most obvious hypotheses.

    Second, there is very little doubt in my mind that we are talking about a death-of-a-thousand-cuts scenario where many small effects add up to a big whopping and observable disparity. It is in my estimation going to be an error to try to find “THE” problem.

    DJMH: Does this mean that people are more likely to allow bias to affect them in private?

    It could also mean a more bimodal distribution, perhaps being driven by a failure to extend the benefit of the doubt to the more marginal cases.

    The co-variate analyses in Ginther break down as we narrow down to a more limited slice of the distribution. It makes it really hard to assess the fate of the one? two? five? individual African-American PIs who may have successfully “made it” into the system for a given subfield submitting to a narrow set of study sections.

    I’m envisioning those individuals who, for whatever reason, enjoy most of the benefits we otherwise wring our hands over in grant review: pedigree, current University, age/job tenure, history of success. Those types of African-American individuals may account for much of the success. I am starting to venture beyond my ability to grapple with the numbers and the covariate analyses though so perhaps I am not correct that we are at the interface of not having enough power to ever demonstrate anything in this question.

    What we can’t know is that for even a high-ish level of success, would those limited numbers of individuals have been even more successful if they hadn’t been black? Have they labored under the ever-so-slightly-enhanced triage burden all along? Have they been kept from that third or fourth award that the next PI over was awarded? The effect on career and accomplishment would be HUGE, even if the African-American PI has managed to be reasonably successful anyway. Since we really only convincingly assess merit by accomplishment in these matters, we will never, ever know.

    Like

  28. drugmonkey Says:

    my experiences on study section do not corroborate the idea that triage is a black mark that hurts resubmission.

    Be that as it may, Rockey’s numbers were pretty convincing that this is not a general trend.

    Most study sections are very negative about pulling grants out of triage unless there’s a good likelihood of getting a “fundable score”.

    This has been hardened into stone by the entirely misguided move to discuss grants in the order of preliminary score. It is a tragedy what this has done for score movement. (I have had a LOT of summary statements in the past couple of years where extremely disparate criterion scores did not lead to discussion.) as qaz identifies here, it really constrains the possible “fixes” that can be applied.

    And I would really like to know whether the low funding rate of minority applicants is fully explicable as lack of pedigree and history or whether it is something else.

    Errr…from the Ginther abstract, you don’t even have to get past PubMed

    After controlling for the applicant’s educational background, country of origin, training, previous research awards, publication record, and employer characteristics, we find that black applicants remain 10 percentage points less likely than whites to be awarded NIH research funding.

    At the end of the text, they emphasize this further with more variables:

    We find it troubling that the typical measures of scientific achievement—NIH training, previous grants, publications, and citations—do
    not translate to the same level of application success across race and ethnic groups. Our models controlled for demographics, education and training, employer characteristics, NIH experience, and research productivity, yet they did not explain why blacks are 10 percentage points less likely to receive R01 funding compared with whites.

    -i think they mean NIH review experience and citation count for those last two points

    Like

  29. drugmonkey Says:

    More key text from Ginther, in the event readers of this blog do not have access:

    Applications from black and Asian investigators were
    significantly less likely to receive R01 funding compared with whites for grants submitted once or twice. For grants submitted three or more times, we found no significant difference in award probability between blacks and whites; however,
    Asians remained almost 4 percentage points less likely to receive an R01 award (P < .05). Together, these data indicate that black and Asian investigators are less likely to be awarded an R01 on the first or second attempt, blacks and Hispanics are less likely to resubmit a revised application, and black investigators that do resubmit have to do so more often to receive an award.

    You know, DearReader, that it is a longterm blog exhortation of mine to keep at it. I have in various ways encouraged my readers to put their nose to the NIH Grant Game grindstone. Keep submitting, revise and resubmit every damn thing. Suck up your disadvantages (mostly in the context of junior PIs) and keep on trying. That is a good bit of my overall advice over the past….jesus is it really seven…years.

    I am at a crossroads, Readers, I really am.
    This comment in Ginther
    Assistance with the grants submission and resubmission process may provide a policy lever for diversifying the scientific workforce.

    resonates with my career advice. Doesn’t it? And some of the noise coming from NIH about fixing this problem is similar- let’s train the AA PIs to play the game better. Sounds good, right?

    I struggle hard enough to keep my lab funded. I know many of my fellow PI Readers do too. Can you imagine this scenario? Oh, look you’ll get your money. You just get triaged more often, have to revise more often and are overall less likely to be funded. But hey, keep at it!

    Starting to sound like blaming the victim to me.

    Like

  30. iamnobody Says:

    pretty depressing to read all of these after my 3 last grant triaged (Asian Woman ESI) and reviewers making nasty comments. Keep up your flame, hope something good will come out.

    Like

  31. drugmonkey Says:

    I realize I am possibly stoking stereotype threat and ripples of doubt. Individually, I am sorry for that. I just think this fear should not keep us from addressing the issues.

    Like


  32. […] DrugMonkey picked up on and then picked apart an NIH report that he says shows that if more applications by African-American PIs were discussed during the review process, it would go a long way to addressing the discrepancy between funding rates for African-Americans and whites. To my read, it appears that implicit or explicit biases are reducing the number of African-American applications that get scores that make the discussion cutoff, but that once an application is discussed it has equal likelihood of getting funded regardless of the color of the applicant’s skin. The suggestion DrugMonkey makes is that NIH should ensure that equitable numbers of African-American applications get discussed during the review process. Instead, it appears that NIH is focused exclusively on blaming the victims for getting poor mentoring rather than doing what is under their direct control in reforming their review process. As a geoscientist, I don’t speak NIH, so I can’t follow all the details of the discussion at DrugMonkey’s blog, but there’s a striking parallel here. At Nature, they note an under-representation of women in their reviewer and author pools, and blame the victim by publishing an article that alleges that women are having too many babies to be equitably represented. At NIH, they note an under-representation of African-American PIs getting funded and focus the blame on mentoring. […]

    Like

  33. BugDoc Says:

    These are complex issues that go beyond the peer review process. One concern I have about the Ginther report is that it may have unintended effects on hiring excellent URM faculty candidates. Search committees evaluate the likelihood of a candidate getting external funding for their research program, among other things. In the current funding environment, the Ginther report may introduce more negative unconscious bias into that evaluation process, since it is clear that there are currently more barriers for minority faculty to get funded than for others.

    Like

  34. drugmonkey Says:

    Jesus BugDoc, my blood pressure! I had not yet thought of this but you are correct. Damn.

    Like


Leave a comment