Jocelyn Kaiser at ScienceInsider has obtained data on PI numbers from the NIH.

NIH PIs Graphic

Nice.

I think this graph should be pinned up right next to Sally Rockey’s desk. It is absolutely essential to any attempts to understand and fix grant application success rates and submission churning.

UPDATE 03/12/14: I should have noted that this graph depicts PIs who hold R01-equivalent grants (R01, R23, R29, R37 with ARRA excluded). The Science piece has this to say about the differential from RPG:

NIH shared these data for two sets of grants: research project grants (RPGs), which include all research grants, and R01 equivalents, a slightly smaller category that includes the bread-and-butter R01 grants that support most independent labs.

NIH-PIs-RPG-R01eqBut if you read carefully, they’ve posted the excel files for both the R01-equivalents and RPG datasets. Woo-hoo! Let’s get to graphing, shall we? There is nothing like a good comparison graph to make summary language a little more useful. Don’t you think? I know I do….

A “slightly smaller category” eh? Well, I spy some trends in this direct comparison. Let’s try another way to look at it. How about we express the difference between the number of RPG and R01-equivalent numbers to see how many folks have been supported on non-R01/equivalent Research Project Grants over the years…
NIHPI-RPGdifferentialWell I’ll be hornswaggled. All this invention of DP-this and RC-that and RL-whatsit and all the various U-mechs and P01 (Center components seem to be excluded) in recent years seemingly has had an effect. Sure, the number of R01 equivalent PIs only slightly drifted down from the end of the doubling until now (relieved briefly by the stimulus). So those in NIH land could say “Look, we’re not sacrificing R01s, our BreadNButter(TM) Mech!”. But in the context of the growth of nonR01 RPG projects, well….hmmm.

While I’m getting all irate about the pathetic non-response to the Ginther report, I have been neglecting to think about the intramural research at NIH.

From Biochemme Belle:

The takeaway message from the report of Ginther and colleagues (2011) on Race, Ethnicity and NIH Research Awards can be summed up by this passage from the end of the article:

Applications from black and Asian investigators were significantly less likely to receive R01 funding compared with whites for grants submitted once or twice. For grants submitted three or more times, we found no significant difference in award probability between blacks and whites; however, Asians remained almost 4 percentage points less likely to receive an R01 award (P < .05). Together, these data indicate that black and Asian investigators are less likely to be awarded an R01 on the first or second attempt, blacks and Hispanics are less likely to resubmit a revised application, and black investigators that do resubmit have to do so more often to receive an award.

Recall that these data reflect applications received for Fiscal Years 2000 to 2006.

Interestingly, we were just discussing the most recent funding data from the NIH with a particular focus on the triaged applications. A comment on the Rock Talk blog of the OER at NIH was key.

I received a table of data covering A0 R01s received between FY 2010 and FY2012 (ARRA funds and solicited applications were excluded). Overall at NIH, 2.3% of new R01s that were “not scored” as A0s were funded as A1s (range at different ICs was 0.0% to 8.4%), and 8.7% of renewals that were unscored as A0s were funded as A1s (range 0.0% to 25.7%).

I noted the following for a key distinction between new and competing-continuation applications.

The mean and selected ICs I checked tell the same tale, i.e., that Type 2 apps have a much better shot at getting funded after triage on the A0. NIDA is actually pretty extreme from what I can tell- 2.8% versus 15.2%. So if there is a difference in the A1 resubmission rate for Type 1 and Type 2 (and I bet Type 2 apps that get triaged on A0 are much more likely to be amended and resubmitted) apps, the above analysis doesn’t move the relative disadvantage around all that much. However for NIAAA the Type 1 and Type 2 numbers are closer- 4.7% versus 9.8%. So for NIAAA supplicants, a halving of the resubmission rate for Type 1 might bring the odds for Type 1 and Type 2 much closer.

So look. If you were going to try to really screw over some category of investigators you would make sure they were more likely to be triaged and then make it really unlikely that a triaged application could be revised into the fundable range. You could stoke this by giving an extra boost to triaged applications that had already been funded for a prior interval….because your process has already screened your target population to decrease representation in the first place. It’s a feed-forward acceleration.

What else could you do? Oh yes. About those revisions, poorer chances on the first 1-2 attempts and the need for Asian and black PIs to submit more often to get funded. Hey I know, you could prevent everybody from submitting too many revised versions of the grant! That would provide another amplification of the screening procedure.

So yeah. The NIH halved the number of permitted revisions to previously unfunded applications for those submitted after January 25, 2009.

Think we’re ever going to see an extension of the Ginther analysis to applications submitted from FY2007 onward? I mean, we’re seeing evidence in this time of pronounced budgetary grimness that the NIH is slipping on its rather overt efforts to keep early stage investigator success rates similar to experienced investigators’ and to keep women’s success rates similar to mens’.

The odds are good that the plight of African-American and possibly even Asian/Asian-American applicants to the NIH has gotten even worse than it was for Fiscal Years 2000-2006.

NIH Blames the Victim

January 16, 2014

Just look at this text from RFA-RM-13-017:

The overarching goal of the Diversity Program Consortium is to enhance the diversity of well-trained biomedical research scientists who can successfully compete for NIH research funding and/or otherwise contribute to the NIH-funded workforce. The BUILD and NRMN initiatives are not intended to support replication or expansion of existing programs at applicant institutions (for example, simply increasing the number of participants in current NIH-funded research training or mentoring programs would not be responsive to this funding announcement).

The three forgoing major initiatives share one thing in common: Make the black PIs better in the future.

The disparity we’ve been talking about? That is clearly all the fault of the current black PIs….they just aren’t up to snuff.

Specifics? also revealing

 

Goals for the NRMN include the following:

  • Working with the Diversity Program Consortium to establish core competencies and hallmarks of success at each stage of biomedical research careers (i.e., undergraduate, graduate, postdoctoral, early career faculty).

  • Developing standards and metrics for effective face-to-face and online mentoring.

  • Connecting students, postdoctoral fellows, and faculty in the biomedical research workforce with experienced mentors, including those with NIH funding, both in person and through online networks.

  • Developing innovative strategies for mentoring and testing efficacy of these approaches.

  • Active outreach is expected to be required to draw mentees into the network who otherwise would have limited access to research mentors.

  • Developing innovative and novel methods to teach effective mentoring skills and providing training to individuals who participate as mentors in the NRMN.

  • Providing professional development activities (grant writing seminars, mock study sections, etc.) and biomedical research career “survival” strategies, and/or facilitating participation in existing development opportunities outside the NRMN.

  • Enhancing mentee access to information and perceptions about biomedical research careers and funding opportunities at the NIH and increasing understanding of the requirements and strategies for success in biomedical careers through mentorship.

  • Creating effective networking opportunities for students, postdoctoral fellows, and early career faculty from diverse backgrounds with the larger biomedical research community.

  • Enhancing ability of mentees to attain NIH funding.

To my eye, only one of these comes even slightly close to recognizing that there are biases in the NIH system that work unfairly against underrepresented PIs.

Jeremy Berg made a comment

If you look at the data in the Ginther report, the biggest difference for African-American applicants is the percentage of “not discussed” applications. For African-Americans, 691/1149 =60.0% of the applications were not discussed whereas for Whites, 23,437/58,124 =40% were not discussed (see supplementary material to the paper). The actual funding curves (funding probability as a function of priority score) are quite similar (Supplementary Figure S1). If applications are not discussed, program has very little ability to make a case for funding, even if this were to be deemed good policy.

that irritated me because it sounds like yet another version of the feigned-helpless response of the NIH on this topic. It also made me take a look at some numbers and bench race my proposal that the NIH should, right away, simply pick up enough applications from African American PIs to equalize success rates. Just as they have so clearly done, historically, for Early Stage Investigators and very likely done for woman PIs.

Here’s the S1 figure from Ginther et al, 2011:
Ginther-S1

[In the below analysis I am eyeballing the probabilities for illustration’s sake. If I’m off by a point or two this is immaterial to the the overall thrust of the argument.]

My knee jerk response to Berg’s comment is that there are plenty of African-American PI’s applications available for pickup. As in, far more than would be required to make up the aggregate success rate discrepancy (which was about 10% in award probability). So talking about the triage rate is a distraction (but see below for more on that).

There is a risk here of falling into the Privilege-Thinking, i.e. that we cannot possible countenance any redress of discrimination that, gasp, puts the previously underrepresented group above the well represented groups even by the smallest smidge. But looking at Supplementary Fig1 from Gither, and keeping in mind that the African American PI application number is only 2% of the White applications, we can figure out that a substantial effect on African American PI’s award probability would cause only an imperceptible change in that for White PI applications. And there’s an amazing sweetener….merit.

Looking at the award probability graph from S1 of Ginther, we note that there are some 15% of the African-American PI’s grants scoring in the 175 bin (old scoring method, youngsters) that were not funded. About 55-56% of all ethnic/racial category grants in the next higher (worse) scoring bin were funded. So if Program picks up more of the better scoring applications from African American PIs (175 bin) at the expense of the worse scoring applications of White PIs (200 bin), we have actually ENHANCED MERIT of the total population of funded grants. Right? Win/Win.

So if we were to follow my suggestion, what would be the relative impact? Well thanks to the 2% ratio of African-American to White PI apps, it works like this:

Take the 175 scoring bin in which about 88% of white PIs and 85% of AA PIs were successful. Take a round number of 1,000 apps in that scoring bin (for didactic purposes, also ignoring the other ethnicities) and you get a 980/20 White/African-AmericanPI ratio of apps. In that 175 bin we’d need 3 more African-American PI apps funded to get to 100%. In the next higher (worse) scoring bin (200 score), about 56% of White PI apps were funded. Taking three from this bin and awarding three more AA PI awards in the next better scoring bin would plunge the White PI award probability from 56% to 55.7%. Whoa, belt up cowboy.

Moving down the curve with the same logic, we find in the 200 score bin that there are about 9 AA PI applications needed to put the 200 score bin to 100%. Looking down to the next worse scoring bin (225) and pulling these 9 apps from white PIs we end up changing the award probability for these apps from 22% to ..wait for it….. 20.8%.

And so on.

(And actually, the percentage changes would be smaller in reality because there is typically not a flat distribution across these bins and there are very likely more applications in each worse-scoring bin compared to the next better-scoring bin. I assumed 1,000 in each bin for my example.)

Another way to look at this issue is to take Berg’s triage numbers from above. To move to 40% triage rate for the African-AmericanPI applications, we need to shift 20% (230 applications) into the discussed pile. This represents a whopping 0.4% of the White PI apps being shifted onto the triage pile to keep the numbers discussed the same.

These are entirely trivial numbers in terms of the “hit” to the chances of White PIs and yet you could easily equalize the success rate or award probability for African-American PIs.

It is even more astounding that this could be done by picking up African-American PI applications that scored better than the White PI applications that would go unfunded to make up the difference.

Tell me how this is not a no-brainer for the NIH?

As you know I am distinctly unimpressed with the NIH’s response to the Ginther report which identified a disparity in the success rate of African-American PIs when submitting grant applications to the NIH.

The NIH response (i.e., where they have placed their hard money investment in change) has been to blame pipeline issues. The efforts are directed at getting more African-American trainees into the pipeline and, somehow, training them better. The subtext here is twofold.

First, it argues that the problem is that the existing African-American PIs submitting to the NIH just kinda suck. They are deserving of lower success rates! Clearly. Otherwise, the NIH would not be looking in the direction of getting new ones. Right? Right.

Second, it argues that there is no actual bias in the review of applications. Nothing to see here. No reason to ask about review bias or anything. No reason to ask whether the system needs to be revamped, right now, to lead to better outcome.

A journalist has been poking around a bit. The most interesting bits involve Collins’ and Tabak’s initial response to Ginther and the current feigned-helplessness tack that is being followed.

From Paul Basken in the Chronicle of Higher Education:

Regarding the possibility of bias in its own handling of grant applications, the NIH has taken some initial steps, including giving its top leaders bias-awareness training. But a project promised by the NIH’s director, Francis S. Collins, to directly test for bias in the agency’s grant-evaluation systems has stalled, with officials stymied by the legal and scientific challenges of crafting such an experiment.

“The design of the studies has proven to be difficult,” said Richard K. Nakamura, director of the Center for Scientific Review, the NIH division that handles incoming grant applications.

Hmmm. “difficult”, eh? Unlike making scientific advances, hey, that stuff is easy. This, however, just stumps us.

Dr. Collins, in his immediate response to the Ginther study, promised to conduct pilot experiments in which NIH grant-review panels were given identical applications, one using existing protocols and another in which any possible clue to the applicant’s race—such as name or academic institution—had been removed.

“The well-described and insidious possibility of unconscious bias must be assessed,” Dr. Collins and his deputy, Lawrence A. Tabak, wrote at the time.

Oh yes, I remember this editorial distinctly. It seemed very well-intentioned. Good optics. Did we forget that the head of the NIH is a political appointment with all that that entails? I didn’t.

The NIH, however, is still working on the problem, Mr. Nakamura said. It hopes to soon begin taking applications from researchers willing to carry out such a study of possible biases in NIH grant approvals, and the NIH also recently gave Molly Carnes, a professor of medicine, psychiatry, and industrial and systems engineering at the University of Wisconsin at Madison, a grant to conduct her own investigation of the matter, Mr. Nakamura said.

The legal challenges include a requirement that applicants get a full airing of their submission, he said. The scientific challenges include figuring out ways to get an unvarnished assessment from a review panel whose members traditionally expect to know anyone qualified in the field, he said.

What a freaking joke. Applicants have to get a full airing and will have to opt-in, eh? Funny, I don’t recall ever being asked to opt-in to any of the non-traditional review mechanisms that the CSR uses. These include phone-only reviews, video-conference reviews and online chat-room reviews. Heck, they don’t even so much as disclose that this is what happened to your application! So the idea that it is a “legal” hurdle that is solved by applicants volunteering for their little test is clearly bogus.

Second, the notion that a pilot study would prevent “full airing” is nonsense. I see very few alternatives other than taking the same pool of applications and putting them through regular review as the control condition and then trying to do a bias-decreasing review as the experimental condition. The NIH is perfectly free to use the normal, control review as the official review. See? No difference in the “full airing”.

I totally agree it will be scientifically difficult to try to set up PI blind review but hey, since we already have so many geniuses calling for blinded review anyway…this is well worth the effort.

But “blind” review is not the only way to go here. How’s about simply mixing up the review panels a bit? Bring in a panel that is heavy in precisely those individuals who have struggled with lower success rates- based on PI characteristics, University characteristics, training characteristics, etc. See if that changes anything. Take a “normal” panel and provide them with extensive instruction on the Ginther data. Etc. Use your imagination people, this is not hard.

Disappointingly, the CHE piece contains not one single bit of investigation into the real question of interest. Why is this any different from any other area of perceived disparity between interests and study section outcome at the NIH? From topic domain to PI characteristics (sex and relative age) to University characteristics (like aggregate NIH funding, geography, Congressional district, University type/rank, etc) the NIH is full willing to use Program prerogative to redress the imbalance. They do so by funding grants out of order and, sometimes, by setting up funding mechanisms that limit who can compete for the grants.

2013-FundingByCareerStageIn the recent case of young/recently transitioned investigators they have trumpeted the disparity loudly, hamfistedly and brazenly “corrected” the study section disparity with special paylines and out of order pickups that amount to an affirmative action quota system [PDF].
All with exceptionally poor descriptions of exactly why they need to do so, save “we’re eating out seed corn” and similar platitudes. All without any attempt to address the root problem of why study sections return poorer scores for early stage investigators. All without proving bias, describing the nature of the bias and without clearly demonstrating the feared outcome of any such bias.

“Eating our seed corn” is a nice catch phrase but it is essentially meaningless. Especially when there are always more freshly trained PHD scientist eager and ready to step up. Why would we care if a generation is “lost” to science? The existing greybeards can always be replaced by whatever fresh faces are immediately available, after all. And there was very little crying about the “lost” GenerationX scientists, remember. Actually, none, outside of GenerationX itself.

The point being, the NIH did not wait for overwhelming proof of nefarious bias. They just acted very directly to put a quota system in place. Although, as we’ve seen in recent data this has slipped a bit in the past two Fiscal Years, the point remains.

Why, you might ask yourself, are they not doing the same in response to Ginther?

Our longtime blog commenter dsks is always insightful. This time, the proposal is such a doozy that it is worth dragging up as a new post.

… just make it official and block all triaged applications from subsequent resubmission. Maybe then use the extra reviewer time and money to bring back the A2, perhaps restricting it to A1 proposals that come in under ~30%ile or something.

Hell, I think any proposal that consistently scores better than 20%ile should be allowed to be resubmitted ad infinitum until it gets funded. Having to completely restructure a proposal because it couldn’t quite make the last yard over what is accepted to be a rather arbitrary pay-line is insane.

On first blush that first one sounds pretty good. Not so sure about the endless queuing of an above payline, below 20%ile grant, personally. (I mean, isn’t this where Program steps in and just picks it up already?)

This reminds me of something, though. Unlike in times past, the applicant now has some information on just how strong the rejection really was because of the criterion scores. This gives some specific quantification in contrast to only being able to parse the language of the review.

One would hope that there would be some correlation between the criterion scores and the choice of the PI to resubmit. As in, if you get 4s and 5s on Approach or Significance, maybe it is worth it. 7s and 8s mean you really better not bother.