Lake Wobegon effect in NIH grant review

March 8, 2011

All the Investigators are strong….and the Environments are above-average.
The “Investigator” and “Environment” criteria have been an explicit part of NIH grant review since forever, and have been given approximately equal weight with Approach, Significance and Innovation.
The blurbs in the official NIH notice on the current scheme read:

Investigator(s). Are the PD/PIs, collaborators, and other researchers well suited to the project? If Early Stage Investigators or New Investigators, do they have appropriate experience and training? If established, have they demonstrated an ongoing record of accomplishments that have advanced their field(s)? If the project is collaborative or multi-PD/PI, do the investigators have complementary and integrated expertise; are their leadership approach, governance and organizational structure appropriate for the project?

Environment. Will the scientific environment in which the work will be done contribute to the probability of success? Are the institutional support, equipment and other physical resources available to the investigators adequate for the project proposed? Will the project benefit from unique features of the scientific environment, subject populations, or collaborative arrangements?

I always had the distinct impression these were essentially throwaway criteria because they were almost always rated very highly. Sometimes the “Investigator” criterion would be a place to cap on the more-junior career status or lack of productivity but for the most part it was treated very politely.
Sally Rockey has recently posted the verification of this impression on the OER blog.

OERcriterion-scores.jpgsource

the data presented represents 54,727 research grant applications submitted for funding in fiscal year 2010. Of these, 32,546 applications were discussed and received final overall impact scores.

So for those grants that survive triage and are discussed, almost all of them are given a 3 or better for Investigator and all are given a 3 or better for Environment. Not a lot of range there. Remember the study section is supposed to be clearly telling the Program Staff why the proposal is strong or weak on these different criteria. Telling them “yeah, good” for these two allegedly major criteria doesn’t seem to be that helpful.
Do note, I’m not one that thinks this is incorrect. For the most part, Investigators are well qualified. And the Environments are generally supportive of the work. If there is a serious problem with these…well, it is not atypical that the rest of the application has even more serious problems. So perhaps all the moderately questionable PIs and Environments are linked to really, really bad proposals and are thus triaged? (Hmm, I’d like to see these for the triaged grants, the alternate hypothesis is that even for the lesser proposals the Investigator and Environment scores still see little variation?)
The analysis of correlations between criterion scores and the voted overall-impact scores is the main point of the post, however. In this case the analysis conducted for all research grants reviewed for FY2010 funding echos the results of the prior analysis of criterion and overall impact scores published by Director Berg for his NIGMS applications in FY2010. Approach is still king.
Small differences emerge, however. The Rockey dataset shows that the correlation of Innovation and Significance criterion scores with overall impact scores are 0.62 and 0.69 respectively. In the Berg/NIGMS dataset it was 0.59 and 0.70. Same relationship, but converging.
This is why I’m interested in both differences between institutes and changes over time. Also, between different grant award mechanisms. The NIGMS post makes it clear they are analyzing scores for 654 R01 applications that were discussed and received a voted score. Rockey’s post just says 32,546 discussed “research grant applications” so this could be a big mixture of mechanisms.
But recall that the big push in the years leading up to these review outcomes was for Innovation. Reviewers are supposed to be prioritizing this criterion above “Approach”. Prioritizing “Significance” as well. So the really interesting question is whether this relationship budges in the next several FY’s of grant reviewing. From all the bleating from the direction of NIH/CSR about how we “should” be prioritizing grant proposals, it would be nothing less than a major failure if the “Approach” criterion does not quickly fall behind “Innovation” and, especially, “Significance” in terms of correlating with overall impact score. Right?

39 Responses to “Lake Wobegon effect in NIH grant review”

  1. whimple Says:

    I don’t think reviewers really pay any attention to the criteria sub-scores and all this analysis is a waste of time. The sub-scores are just ex post-facto justification of the overall impression. For example, lots of people get their grants effectively DQ’d on the basis of “lack of investigator productivity” which pretty goes unreflected in the “investigator” criterion. Maybe it would just be adding insult to injury to hand out a lukewarm priority score AND whomp someone up with a 5+ for “investigator”, but everyone’s getting 2s and 3s for this sub-category regardless.

    Like

  2. drdrA Says:

    Yeah- I saw that analysis w/ investigator and environment and I kind of went- meh, tell me something I don’t know.
    Love your title though. 🙂

    Like

  3. miko Says:

    yeah, whimple’s dead on… it’s all crap. A PCA of the whole dataset would show there is only one factor for any application/reviewer: did I like it?

    Like

  4. DrugMonkey Says:

    The OER analysis pulled out two factors, miko.

    Like

  5. mygraduationday Says:

    overall impact is the only thing that matters and CSR says that it is not the arithmetic mean of the other criterion, so why not give “feel good” scores on investigator and environment? Innovation is in the eye of the individual reviewer, and I have seen this vary from 2-8 during review. Some reviewers rely on technical innovation, other conceptual. Hard to say where innovation is heading, a real crapshoot now.

    Like

  6. mygraduationday Says:

    overall impact is the only thing that matters and CSR says that it is not the arithmetic mean of the other criterion, so why not give “feel good” scores on investigator and environment? Innovation is in the eye of the individual reviewer, and I have seen this vary from 2-8 during review. Some reviewers rely on technical innovation, other conceptual. Hard to say where innovation is heading, a real crapshoot now.

    Like

  7. whomPhD Says:

    Not in my study section. I was on the receiving end of two fives and a six with the explanation that after four years of having my lab, I was not productive enough (only 2 published / 1 in press papers). Institutional scores were 2-3. The grant was, of course, triaged and I’m on my way out. Peace.

    Like

  8. SciGuy Says:

    This whole new scoring system is a joke. Study section committees do whatever they as a group want to do regardless of the recommendations of blue ribbon panels. Increasingly, you may get money whether your science is steeped in turn of the century (19th) measurements of blood pressure or you have published only in low-peer review low impact journals. Some committees actively ignore science in order to fund members of the cabal.
    A key feature of the new system over the old is the single reviewer veto. Since grants are ranked on initial scores – scores generated in the dark of the night for personal reasons from the privacy of the reviewer’s home – many proposals never see the light of day. They are not discussed before the committee face to face. Differences of opinion are not weighed. The number of disparate scores – 1’s and 2’s from two of three reviewers against the lone dissenter with 5’s and 6’s – is on the rise and they most often do not get resolved on the basis of science. The dissenting score is averaged with the positive reviews and the average can put a proposal in the dreaded second day before the committee when momentum of the committee roles over the application and the veto is complete. Many are not even discussed ignoring the positive evaluations in favor of the lone dissenter. In the old deliberative system, the assassin would have to defend the flaws in person, face to face with other reviewers. In the old system, unfounded dissent would whither before the committee and miscreants would have little credibility. This was the scientific sunshine that is lacking in the current protocol that authorities like to claim is “efficient”. Cursory reviews of lots of short applications in a short period of time or better yet – web based or email based. We are going to a system that is cheap and ineffective. It will take some time for the trends to develop but I predict that we will see a concentration of funding at a more limited number of labs. Labs that use the one trendy approach on multiple systems will clean up on the money game and crank out thoughtless results in pay per view publications. NIH has shot itself in the head with the reinvention of peer review by taking the peer out of the review and cast a shadow where the light should be brightest.

    Like

  9. DrugMonkey Says:

    Many are not even discussed ignoring the positive evaluations in favor of the lone dissenter. In the old deliberative system, the assassin would have to defend the flaws in person, face to face with other reviewers. In the old system, unfounded dissent would whither before the committee and miscreants would have little credibility.
    Any reviewer can bring any application up for discussion. The lone assassin theory only holds water if somehow those two favorable reviewers are wilting violets and fail to ask to discuss the app. I have rarely seen that happen. More likely the two adoring reviewers have been convinced by the “assassin” that their initial read was a little too optimistic.

    Like

  10. SciGuy Says:

    “Any reviewer can bring any application up for discussion.”
    Most study sections are woefully thin in expertise today. Although the “any reviewer” clause can be evoked, in practice, it is little used in the new regime. The common scenario it that it is day 2 when the arithmetic dictates that these split votes get attention and to “reconsider” a late breaking mistake or bias, would upset the order of previous day’s funding established in the first roll through. This is momentum. People are checking their airline tickets and have little energy for either the attention or the engagement. Even factual mistakes in the new electronic – especially asynchronous – formats are difficult to correct. That has happened twice last year in my personal experience as an NIH reviewer. Despite my attempts to get it resolved, in the fuzzy electronic back and forth, it was not clear in either case that those data on brain tissues misread as kidney samples ever got corrected in the minds of the voting reviewers. This rush to judgment has been engineered into the new NIH peer review format (best to worst reading, short applications, vague bullets, imprecise critiques and fewer face to face meetings). As everyone admits, NIH peer review was effective at 25% paylines. Now, the prize for the best review process is speed. Re-thinking to protect the good idea against the lone outlier – assassin or incompetent (I have witnessed both) – just does not come about as often as it should. At quality journals, peer review takes the time. NIH review is engineered now to rush the judgment.

    Like


  11. Most study sections are woefully thin in expertise today.

    That’s a pretty bold assertion! Any evidence?

    Like

  12. Cashmoney Says:

    That’s a pretty bold assertion! Any evidence?
    I’m guessing the evidence is that SciGuy’s grant scored outside of the fundable range…

    Like

  13. DrugMonkey Says:

    Speaking of criterion scores, a tiny vignette

    Like

  14. Pinko Punko Says:

    CPP, don’t be a whatever you would call everyone else. There’s a ton of expertise on study sections, but the difference between “wev” and “yay” could be whether someone in one’s own field is there to explain the significance of the work in case the section has evolved away from topics it traditionally has trafficked in. So, some study sections might be thin on appropriate expertise.
    Also, Cashmoney, it is of course incredibly easy for you to type a comment concerning negative aspects of grant review as sour grapes. It is so predictable that you can leave it home next time!

    Like

  15. DK Says:

    That’s a pretty bold assertion! Any evidence?
    Does it matter whether it is the potential or the execution? Fact is, most reviews are lousy, bordering on incompetent. In large part it is simply because reviewers never bother to read the thing thoroughly and think it through (who has time for this?). Evidence? Only anecdotal. 90% of grant reviews that I’ve seen, regardless of the score, are like this.
    Exactly the same goes for manuscripts in journals, BTW.

    Like

  16. Pinko Punko Says:

    Actually, I do wonder if much of the correlation with approach comes from subconscious inclusion into that score of study section conservatism. For example an innovative proposal that doesn’t have enough prelim data wouldn’t get dinged on innovation, but would on approach, even though the approach may very well be sound, just that the more conservative the reviewers get, the more they are going to focus all of the dings onto approach. Also, what reviewer is going to give an amazing Approach score and shit on the investigator or environment. They may or may not reflect their actual thoughts in investigator or environment, but they’ll move those over to approach if they don’t feel convinced.

    Like

  17. qaz Says:

    Do we really want to get into the fight of whether BigNameSchool is a better environment for science than BigStateResearchU or vice versa? I’ve always taken environment to mean “is it a good place for this research to be done?” People seem to use the environment score as a way of marking problems in environment, otherwise, just give it a 1. Most places *are* good places to do science.
    Approach is where people address the science of the grant, so obviously it’s where the range is.

    Like


  18. There’s a ton of expertise on study sections, but the difference between “wev” and “yay” could be whether someone in one’s own field is there to explain the significance of the work in case the section has evolved away from topics it traditionally has trafficked in.

    If you are not aware of the trends of expertise on the various study sections that could conceivably be a relevant home for your grant, are not involved in ongoing discussions with SROs about this issue and to provide input concerning appropriate areas for necessary bolstering with ad hoc members, and are not writing your grants in a very targeted fashion to deal with the reality on the ground in the study section you have decided to have your grant assigned to, then you are fuckeing uppe, bigge tyme.

    Like

  19. DrugMonkey Says:

    So you are saying you write down for the rubes, PP?

    Like


  20. Write what down for which rubes?

    Like

  21. DrugMonkey Says:

    spot on, qaz. I think it is just vanishingly rare for PIs to propose some research that their environment can’t support.
    In R01s anyway. I wonder about the Big Mechs. Think it is the same? Smaller institutions don’t bother?

    Like

  22. Pinko Punko Says:

    No way do I write down for the rubes, but I think some reviewers clearly do, DM. But that is merely anecdata.
    CPP- you overestimate the magical powers of the SRO to discern appropriate venue for your grant. Of course they are helpful, and you do what you can, but sometimes you get “fucckkkkeddde”

    Like


  23. you overestimate the magical powers of the SRO to discern appropriate venue for your grant.

    You are misreading. I am not saying any SRO is going to be able to discern the best venue for your grant. *You* need to discern that, but discussions with SROs of various study sections is a key input to your discernment process.

    Like

  24. Cashmoney Says:

    Pinko Punko are you denying that most people raving about incompetent review 1) do not have a reasonably recent term of service on a study section and 2) have not been able to compete successfully for funding?

    Like


  25. I have received dozens of summary statements of my own grants and served numerous times on study sections, and even the conclusions and scores I have disagreed with vehemently, I have never attributed to reviewer *incompetence*. I have attributed them to reviewer shortsightedness and to reviewers weighing matters of scientific judgment differently than I would and to reviewers having goals for review outcome that differ from my own, but I have never experienced what I would consider to be true *incompetence*.
    In this context, I define incompetence as an inability to comprehend the approach, significance, innovation, investigator, and environment of an application that is written clearly and understandably for an audience defined as the study section it has been assigned to. And this is where I think most disgruntled applicants are delusional: they atttribute the failures of poorly written and improperly targeted grants to reviewer incompetence.

    Like

  26. Pinko Punko Says:

    I have seen poor reviews. Nobody can dispute reviews can be poor. These would be categorized as shortsighted, possibly not-inappropriately, but I don’t necessarily agree that this means the grant was poorly written (many grants are poorly written, nobody would deny this either). For example, cases where the reviewer says things that are directly contradicted by the proposal in an extreme enough way that the evidence is on the side of the reviewer not reading the grant carefully enough. Many times this can because of the grant. Many times this is on the reviewer. (I know argument from assertion of “many times”)
    For example, a grant that has gone through a large number of helpful eyes, including colleagues that have served on the exact study section in years past, and colleagues not in the immediate field, providing coverage for grantsmanship and level of sophistication for the proposal, but receives one out of three reviews that says:
    Weakness: “I don’t think this technique will work”
    Strength: “Working with collaborator x”
    Grant: “We are working with collaborator x (see letter of support) who has successfully developed technique y to accomplish method z on genome-wide scale.”
    In manuscript reviews, many poor reviews are hidden because they are very short positive reviews. Many negative reviews can be poor due to lack of scientific skill of the reviewer or a misunderstanding of what was written. Many negative reviews can be highly skillful and spot on. Evidence from manuscript reviews indicates that there is a massively wide level of reviewer skill in the pool. NIH panels are a lot narrower on this range and generally quite perceptive. When the funding level drops, defects in the process become magnified because they are more important.
    I merely claim that there are some poor reviews that are independent of how the proposal was written. These may relate to scientific disagreements between the reviewer and the grantwriter that cannot be overcome by how well the grant was written (the “I don’t care what you say, I don’t believe it” argument) or they might be do to the reviewer refusing to accept the importance of the work, especially if it is outside his/her field. These types of reviews/disagreements are more philosophical in nature and they may end up couched in dings that are not necessarily relevant to how the grant was written. I just think this isn’t a controversial statement, and I think the points I am trying to make are constructive to the discussion, unlike “hey whiners, I bet your funding level sucks, side note: why won’t anyone nominate me for the comment originality club?”

    Like

  27. Neuro-conservative Says:

    In my experience, nearly all reviewers are scientifically competent and well-qualified, even highly so (although I do have one truly empty-suit colleague whom I can only imagine has left a trail of destruction through his/her years of service).
    However, this does not mean that all reviews are competently executed. All too often, I have seen reviewers say: “Application fails to consider X” when there is a clearly labelled subsection on X in the application (yes, this just happened to me, but I have seen it many times on both sides of the coin).

    Like

  28. whimple Says:

    It’s not really the job of the study section panel to provide you with competently worded feedback, rather it is the job of the reviewer to provide an informed priority score opinion to the panel. Particularly if the score isn’t that great, you can understand the reviewers’ disenthusiasm for going to lot of effort in the written review.

    Weakness: “I don’t think this technique will work”
    Strength: “Working with collaborator x”
    Grant: “We are working with collaborator x (see letter of support) who has successfully developed technique y to accomplish method z on genome-wide scale.”

    Interpretation: I don’t consider this technique well-established enough, and certainly not in your hands, to provide meaningful data so I have dinged your application with respect to “approach” accordingly. I think you’d have no chance of getting this to work on your own, so it’s good that you are working with Collaborator X, although even with this collaboration, I don’t think you can do it. This is going to have to be one of those, “show me it is working in your hands” situations for me to consider it as a viable approach in your application.

    Like

  29. DrugMonkey Says:

    What whimple said.
    Also, “you might be able to get this working but it is going to take 9 months on this alone. It isn’t the slam dunk you make out and therefore your experimental plans cannot possibly be accomplished in any reasonable timescale consistent with the overall proposal. Oh, and good luck getting your letter writer to actually help to the degree you are actually going to need it”

    Like

  30. Pinko Punko Says:

    Perhaps.
    But it really isn’t that sort of technique. I checked on the more accurate wording and it was “even with collaborator’s help, I don’t think this will work/is possible”
    When the wording was “the collaborator has this working and it is possible”
    So there is an interpretable disconnect. Of course we can always interpret these things with the standard boilerplate takes, such as “this proposal has not yet been completed, and this is the only evidence I would accept” which runs into “this proposal IS completed, so why should it be funded.”

    Like

  31. whimple Says:

    Good point! Be sure to point out the “interpretable disconnect” in your resubmission. I’m sure they’ll appreciate that and give it a fundable score next time. *giggle*

    Like

  32. Pinko Punko Says:

    It’s fundable now. Why are you being such a “troll” as the kids say on the internet?
    As a reviewer of papers I see other reviews and sometimes they are clearly and demonstrably wrong, and I wouldn’t fault the manuscript. Why is it simply not possible for a grant review to be less than optimal?
    Many people read DM because they learn a ton about the process, the commiserate about some things. We’re all in the same boat, except there is also another boat: a boat called the “enjoys the pain of strangers” and Captain Whimple at the helm, lately promoted to insufferable pr……[this comment has been triaged]

    Like

  33. whimple Says:

    I thought from what you wrote that you genuinely didn’t understand how such superficially contradictory comments in review of your application could be perfectly consistent with competent review. After this was explained to you, you seemingly went into the common denial exhibited by a large fraction of the 80% to 90% of applicants that don’t get a fundable score: the reviewers didn’t get it, or didn’t read it, or were logically inconsistent in their interpretation. Instead you might as well consider that as being your fault, because complaining about your perceived quality of the review is certain to be unproductive.

    Like

  34. CD0 Says:

    One thing that I have renounce to understand is the obsession of some scientist to make their personal opinions prevail against the judgment of the study section. You have nothing to win by refusing even to consider that they may have a point.
    Everybody can make a bad review and, theoretically, even 3 bad reviews at the same time. But if they tell you that a new technique is not established and may work only in the hands of a recognized expert in the field, this sounds like a very reasonable comment. You either show that you have made it work in your lab (you must have done it before, actually), or you change the approach.
    The system may be fair or not but, if you want to get ever funded, it is easier to adapt your proposal to the culture of he study section than to expect the study section to adapt your personal culture.

    Like


  35. This is a good time to repeat the mantra that this kind of unpredictability of review is exactly why you need to have multiple grant applications in the system all the time, targeting different study sections for review. Seeing young investigators moaning and groaning on the Internet about the fate of “my R01 application” is heartbreaking, because it is so foolish. And anyone who complains that they “can’t write multiple R01s” because they lack the resources, ideas, preliminary data, whatever are simply not thinking creatively. The same exact preliminary data can form the basis for an infinite number of different grant applications.
    If your career hinges on the review of a single grant application, you are doing it wrong. Even the most successful grant writers submit multiple competing applications (including resubmissions) for every one that gets funded. Over the years, I have averaged two competing NIH submissions (including resubmissions) per year, with only about 1/3 getting funded.
    Grantsmanship is a stochastic process. Deny this fact at your peril.

    Like

  36. Pinko Punko Says:

    Whimple, you are arguing with the straw Pinko Punko and not the actual one. Yes, your experience with commenters allows you to generate a profile or type with which to address common themes. Those are not relevant here.
    CPP, of course you are correct with your multiple grants in play at all times. Nothing I have said contradicts this wise statement.
    Additionally, it is easily conceivable to recognize that reviews can be poor yet work within the framework of creativity to sway or bring the panel over to your side, whether in winning a majority or all of the reviewers even in the face of knee-jerk presumptions or generic poorly considered criticism from a minority of reviewers. I really don’t think any of this is controversial or deserving of abuse.
    I’m on your side, dudes. When I look at colleagues grants, I ask myself what would CPP say, then I tell them what DM would say, because they likely won’t respond to being called fuckinge shitteass dumfukkes.

    Like


  37. I think the overall point here is that handwringing about the “quality” or “competence” of NIH peer review doesn’t help anyone get funded.

    Like

  38. Eli Rabett Says:

    OK, so Eli reviews NASA and NSF grants, and not NIH, but the bullshit about multiple grants in the system which is what we have today especially among the soft money folk, is killing the science. Even with computers ( and yes, Eli is that old), putting together multiples in a way that they address different programs (study session) and don’t allow the program manager to say, my program doesn’t have to do this it belongs over there chews up huge amounts of time and effort.
    God do we need birth control

    Like

  39. DrugMonkey Says:

    Eli,
    During the early to mid 80s, the success rate for experienced applicants was around 40%. Seriously.
    In my subfields, true, a lot of progress was made. However, it was made by a limited number of laboratories…even more sharply limited if you examine training pedigrees. And in retrospect there was one heck of a lot of street-light-problem going on. *Diversity* of approach and topic was stifled. To make it relatively specific, we learned a lot about the reinforcing properties of cocaine and heroin. Choked out? From the stand point of the diversity that exists post-doubling, a LOT.
    A highly competitive environment does not exactly dis-favor the same-old, same-old. But at least there is a chance that panels (and POs) will be thinking “hey, we can only afford ONE of these, not five on the exact same topic”. And that will be good for the progress of science…

    Like


Leave a comment