Lake Wobegon effect in NIH grant review
March 8, 2011
All the Investigators are strong….and the Environments are above-average.
The “Investigator” and “Environment” criteria have been an explicit part of NIH grant review since forever, and have been given approximately equal weight with Approach, Significance and Innovation.
The blurbs in the official NIH notice on the current scheme read:
Investigator(s). Are the PD/PIs, collaborators, and other researchers well suited to the project? If Early Stage Investigators or New Investigators, do they have appropriate experience and training? If established, have they demonstrated an ongoing record of accomplishments that have advanced their field(s)? If the project is collaborative or multi-PD/PI, do the investigators have complementary and integrated expertise; are their leadership approach, governance and organizational structure appropriate for the project?
…
Environment. Will the scientific environment in which the work will be done contribute to the probability of success? Are the institutional support, equipment and other physical resources available to the investigators adequate for the project proposed? Will the project benefit from unique features of the scientific environment, subject populations, or collaborative arrangements?
I always had the distinct impression these were essentially throwaway criteria because they were almost always rated very highly. Sometimes the “Investigator” criterion would be a place to cap on the more-junior career status or lack of productivity but for the most part it was treated very politely.
Sally Rockey has recently posted the verification of this impression on the OER blog.
the data presented represents 54,727 research grant applications submitted for funding in fiscal year 2010. Of these, 32,546 applications were discussed and received final overall impact scores.
So for those grants that survive triage and are discussed, almost all of them are given a 3 or better for Investigator and all are given a 3 or better for Environment. Not a lot of range there. Remember the study section is supposed to be clearly telling the Program Staff why the proposal is strong or weak on these different criteria. Telling them “yeah, good” for these two allegedly major criteria doesn’t seem to be that helpful.
Do note, I’m not one that thinks this is incorrect. For the most part, Investigators are well qualified. And the Environments are generally supportive of the work. If there is a serious problem with these…well, it is not atypical that the rest of the application has even more serious problems. So perhaps all the moderately questionable PIs and Environments are linked to really, really bad proposals and are thus triaged? (Hmm, I’d like to see these for the triaged grants, the alternate hypothesis is that even for the lesser proposals the Investigator and Environment scores still see little variation?)
The analysis of correlations between criterion scores and the voted overall-impact scores is the main point of the post, however. In this case the analysis conducted for all research grants reviewed for FY2010 funding echos the results of the prior analysis of criterion and overall impact scores published by Director Berg for his NIGMS applications in FY2010. Approach is still king.
Small differences emerge, however. The Rockey dataset shows that the correlation of Innovation and Significance criterion scores with overall impact scores are 0.62 and 0.69 respectively. In the Berg/NIGMS dataset it was 0.59 and 0.70. Same relationship, but converging.
This is why I’m interested in both differences between institutes and changes over time. Also, between different grant award mechanisms. The NIGMS post makes it clear they are analyzing scores for 654 R01 applications that were discussed and received a voted score. Rockey’s post just says 32,546 discussed “research grant applications” so this could be a big mixture of mechanisms.
But recall that the big push in the years leading up to these review outcomes was for Innovation. Reviewers are supposed to be prioritizing this criterion above “Approach”. Prioritizing “Significance” as well. So the really interesting question is whether this relationship budges in the next several FY’s of grant reviewing. From all the bleating from the direction of NIH/CSR about how we “should” be prioritizing grant proposals, it would be nothing less than a major failure if the “Approach” criterion does not quickly fall behind “Innovation” and, especially, “Significance” in terms of correlating with overall impact score. Right?
March 8, 2011 at 6:26 pm
I don’t think reviewers really pay any attention to the criteria sub-scores and all this analysis is a waste of time. The sub-scores are just ex post-facto justification of the overall impression. For example, lots of people get their grants effectively DQ’d on the basis of “lack of investigator productivity” which pretty goes unreflected in the “investigator” criterion. Maybe it would just be adding insult to injury to hand out a lukewarm priority score AND whomp someone up with a 5+ for “investigator”, but everyone’s getting 2s and 3s for this sub-category regardless.
LikeLike
March 8, 2011 at 7:16 pm
Yeah- I saw that analysis w/ investigator and environment and I kind of went- meh, tell me something I don’t know.
Love your title though. 🙂
LikeLike
March 8, 2011 at 10:28 pm
yeah, whimple’s dead on… it’s all crap. A PCA of the whole dataset would show there is only one factor for any application/reviewer: did I like it?
LikeLike
March 9, 2011 at 8:29 am
The OER analysis pulled out two factors, miko.
LikeLike
March 9, 2011 at 8:35 am
overall impact is the only thing that matters and CSR says that it is not the arithmetic mean of the other criterion, so why not give “feel good” scores on investigator and environment? Innovation is in the eye of the individual reviewer, and I have seen this vary from 2-8 during review. Some reviewers rely on technical innovation, other conceptual. Hard to say where innovation is heading, a real crapshoot now.
LikeLike
March 9, 2011 at 8:37 am
overall impact is the only thing that matters and CSR says that it is not the arithmetic mean of the other criterion, so why not give “feel good” scores on investigator and environment? Innovation is in the eye of the individual reviewer, and I have seen this vary from 2-8 during review. Some reviewers rely on technical innovation, other conceptual. Hard to say where innovation is heading, a real crapshoot now.
LikeLike
March 9, 2011 at 10:09 am
Not in my study section. I was on the receiving end of two fives and a six with the explanation that after four years of having my lab, I was not productive enough (only 2 published / 1 in press papers). Institutional scores were 2-3. The grant was, of course, triaged and I’m on my way out. Peace.
LikeLike
March 9, 2011 at 12:50 pm
This whole new scoring system is a joke. Study section committees do whatever they as a group want to do regardless of the recommendations of blue ribbon panels. Increasingly, you may get money whether your science is steeped in turn of the century (19th) measurements of blood pressure or you have published only in low-peer review low impact journals. Some committees actively ignore science in order to fund members of the cabal.
A key feature of the new system over the old is the single reviewer veto. Since grants are ranked on initial scores – scores generated in the dark of the night for personal reasons from the privacy of the reviewer’s home – many proposals never see the light of day. They are not discussed before the committee face to face. Differences of opinion are not weighed. The number of disparate scores – 1’s and 2’s from two of three reviewers against the lone dissenter with 5’s and 6’s – is on the rise and they most often do not get resolved on the basis of science. The dissenting score is averaged with the positive reviews and the average can put a proposal in the dreaded second day before the committee when momentum of the committee roles over the application and the veto is complete. Many are not even discussed ignoring the positive evaluations in favor of the lone dissenter. In the old deliberative system, the assassin would have to defend the flaws in person, face to face with other reviewers. In the old system, unfounded dissent would whither before the committee and miscreants would have little credibility. This was the scientific sunshine that is lacking in the current protocol that authorities like to claim is “efficient”. Cursory reviews of lots of short applications in a short period of time or better yet – web based or email based. We are going to a system that is cheap and ineffective. It will take some time for the trends to develop but I predict that we will see a concentration of funding at a more limited number of labs. Labs that use the one trendy approach on multiple systems will clean up on the money game and crank out thoughtless results in pay per view publications. NIH has shot itself in the head with the reinvention of peer review by taking the peer out of the review and cast a shadow where the light should be brightest.
LikeLike
March 9, 2011 at 1:41 pm
Many are not even discussed ignoring the positive evaluations in favor of the lone dissenter. In the old deliberative system, the assassin would have to defend the flaws in person, face to face with other reviewers. In the old system, unfounded dissent would whither before the committee and miscreants would have little credibility.
Any reviewer can bring any application up for discussion. The lone assassin theory only holds water if somehow those two favorable reviewers are wilting violets and fail to ask to discuss the app. I have rarely seen that happen. More likely the two adoring reviewers have been convinced by the “assassin” that their initial read was a little too optimistic.
LikeLike
March 9, 2011 at 2:38 pm
“Any reviewer can bring any application up for discussion.”
Most study sections are woefully thin in expertise today. Although the “any reviewer” clause can be evoked, in practice, it is little used in the new regime. The common scenario it that it is day 2 when the arithmetic dictates that these split votes get attention and to “reconsider” a late breaking mistake or bias, would upset the order of previous day’s funding established in the first roll through. This is momentum. People are checking their airline tickets and have little energy for either the attention or the engagement. Even factual mistakes in the new electronic – especially asynchronous – formats are difficult to correct. That has happened twice last year in my personal experience as an NIH reviewer. Despite my attempts to get it resolved, in the fuzzy electronic back and forth, it was not clear in either case that those data on brain tissues misread as kidney samples ever got corrected in the minds of the voting reviewers. This rush to judgment has been engineered into the new NIH peer review format (best to worst reading, short applications, vague bullets, imprecise critiques and fewer face to face meetings). As everyone admits, NIH peer review was effective at 25% paylines. Now, the prize for the best review process is speed. Re-thinking to protect the good idea against the lone outlier – assassin or incompetent (I have witnessed both) – just does not come about as often as it should. At quality journals, peer review takes the time. NIH review is engineered now to rush the judgment.
LikeLike
March 9, 2011 at 5:43 pm
That’s a pretty bold assertion! Any evidence?
LikeLike
March 9, 2011 at 6:22 pm
That’s a pretty bold assertion! Any evidence?
I’m guessing the evidence is that SciGuy’s grant scored outside of the fundable range…
LikeLike
March 9, 2011 at 7:14 pm
Speaking of criterion scores, a tiny vignette
LikeLike
March 9, 2011 at 10:08 pm
CPP, don’t be a whatever you would call everyone else. There’s a ton of expertise on study sections, but the difference between “wev” and “yay” could be whether someone in one’s own field is there to explain the significance of the work in case the section has evolved away from topics it traditionally has trafficked in. So, some study sections might be thin on appropriate expertise.
Also, Cashmoney, it is of course incredibly easy for you to type a comment concerning negative aspects of grant review as sour grapes. It is so predictable that you can leave it home next time!
LikeLike
March 9, 2011 at 10:29 pm
That’s a pretty bold assertion! Any evidence?
Does it matter whether it is the potential or the execution? Fact is, most reviews are lousy, bordering on incompetent. In large part it is simply because reviewers never bother to read the thing thoroughly and think it through (who has time for this?). Evidence? Only anecdotal. 90% of grant reviews that I’ve seen, regardless of the score, are like this.
Exactly the same goes for manuscripts in journals, BTW.
LikeLike
March 9, 2011 at 10:37 pm
Actually, I do wonder if much of the correlation with approach comes from subconscious inclusion into that score of study section conservatism. For example an innovative proposal that doesn’t have enough prelim data wouldn’t get dinged on innovation, but would on approach, even though the approach may very well be sound, just that the more conservative the reviewers get, the more they are going to focus all of the dings onto approach. Also, what reviewer is going to give an amazing Approach score and shit on the investigator or environment. They may or may not reflect their actual thoughts in investigator or environment, but they’ll move those over to approach if they don’t feel convinced.
LikeLike
March 10, 2011 at 8:28 am
Do we really want to get into the fight of whether BigNameSchool is a better environment for science than BigStateResearchU or vice versa? I’ve always taken environment to mean “is it a good place for this research to be done?” People seem to use the environment score as a way of marking problems in environment, otherwise, just give it a 1. Most places *are* good places to do science.
Approach is where people address the science of the grant, so obviously it’s where the range is.
LikeLike
March 10, 2011 at 9:41 am
If you are not aware of the trends of expertise on the various study sections that could conceivably be a relevant home for your grant, are not involved in ongoing discussions with SROs about this issue and to provide input concerning appropriate areas for necessary bolstering with ad hoc members, and are not writing your grants in a very targeted fashion to deal with the reality on the ground in the study section you have decided to have your grant assigned to, then you are fuckeing uppe, bigge tyme.
LikeLike
March 10, 2011 at 10:43 am
So you are saying you write down for the rubes, PP?
LikeLike
March 10, 2011 at 10:46 am
Write what down for which rubes?
LikeLike
March 10, 2011 at 10:46 am
spot on, qaz. I think it is just vanishingly rare for PIs to propose some research that their environment can’t support.
In R01s anyway. I wonder about the Big Mechs. Think it is the same? Smaller institutions don’t bother?
LikeLike
March 10, 2011 at 10:49 am
No way do I write down for the rubes, but I think some reviewers clearly do, DM. But that is merely anecdata.
CPP- you overestimate the magical powers of the SRO to discern appropriate venue for your grant. Of course they are helpful, and you do what you can, but sometimes you get “fucckkkkeddde”
LikeLike
March 10, 2011 at 11:07 am
You are misreading. I am not saying any SRO is going to be able to discern the best venue for your grant. *You* need to discern that, but discussions with SROs of various study sections is a key input to your discernment process.
LikeLike
March 11, 2011 at 5:53 am
Pinko Punko are you denying that most people raving about incompetent review 1) do not have a reasonably recent term of service on a study section and 2) have not been able to compete successfully for funding?
LikeLike
March 11, 2011 at 9:47 am
I have received dozens of summary statements of my own grants and served numerous times on study sections, and even the conclusions and scores I have disagreed with vehemently, I have never attributed to reviewer *incompetence*. I have attributed them to reviewer shortsightedness and to reviewers weighing matters of scientific judgment differently than I would and to reviewers having goals for review outcome that differ from my own, but I have never experienced what I would consider to be true *incompetence*.
In this context, I define incompetence as an inability to comprehend the approach, significance, innovation, investigator, and environment of an application that is written clearly and understandably for an audience defined as the study section it has been assigned to. And this is where I think most disgruntled applicants are delusional: they atttribute the failures of poorly written and improperly targeted grants to reviewer incompetence.
LikeLike
March 12, 2011 at 12:28 am
I have seen poor reviews. Nobody can dispute reviews can be poor. These would be categorized as shortsighted, possibly not-inappropriately, but I don’t necessarily agree that this means the grant was poorly written (many grants are poorly written, nobody would deny this either). For example, cases where the reviewer says things that are directly contradicted by the proposal in an extreme enough way that the evidence is on the side of the reviewer not reading the grant carefully enough. Many times this can because of the grant. Many times this is on the reviewer. (I know argument from assertion of “many times”)
For example, a grant that has gone through a large number of helpful eyes, including colleagues that have served on the exact study section in years past, and colleagues not in the immediate field, providing coverage for grantsmanship and level of sophistication for the proposal, but receives one out of three reviews that says:
Weakness: “I don’t think this technique will work”
Strength: “Working with collaborator x”
Grant: “We are working with collaborator x (see letter of support) who has successfully developed technique y to accomplish method z on genome-wide scale.”
In manuscript reviews, many poor reviews are hidden because they are very short positive reviews. Many negative reviews can be poor due to lack of scientific skill of the reviewer or a misunderstanding of what was written. Many negative reviews can be highly skillful and spot on. Evidence from manuscript reviews indicates that there is a massively wide level of reviewer skill in the pool. NIH panels are a lot narrower on this range and generally quite perceptive. When the funding level drops, defects in the process become magnified because they are more important.
I merely claim that there are some poor reviews that are independent of how the proposal was written. These may relate to scientific disagreements between the reviewer and the grantwriter that cannot be overcome by how well the grant was written (the “I don’t care what you say, I don’t believe it” argument) or they might be do to the reviewer refusing to accept the importance of the work, especially if it is outside his/her field. These types of reviews/disagreements are more philosophical in nature and they may end up couched in dings that are not necessarily relevant to how the grant was written. I just think this isn’t a controversial statement, and I think the points I am trying to make are constructive to the discussion, unlike “hey whiners, I bet your funding level sucks, side note: why won’t anyone nominate me for the comment originality club?”
LikeLike
March 12, 2011 at 2:36 am
In my experience, nearly all reviewers are scientifically competent and well-qualified, even highly so (although I do have one truly empty-suit colleague whom I can only imagine has left a trail of destruction through his/her years of service).
However, this does not mean that all reviews are competently executed. All too often, I have seen reviewers say: “Application fails to consider X” when there is a clearly labelled subsection on X in the application (yes, this just happened to me, but I have seen it many times on both sides of the coin).
LikeLike
March 12, 2011 at 12:00 pm
It’s not really the job of the study section panel to provide you with competently worded feedback, rather it is the job of the reviewer to provide an informed priority score opinion to the panel. Particularly if the score isn’t that great, you can understand the reviewers’ disenthusiasm for going to lot of effort in the written review.
Weakness: “I don’t think this technique will work”
Strength: “Working with collaborator x”
Grant: “We are working with collaborator x (see letter of support) who has successfully developed technique y to accomplish method z on genome-wide scale.”
Interpretation: I don’t consider this technique well-established enough, and certainly not in your hands, to provide meaningful data so I have dinged your application with respect to “approach” accordingly. I think you’d have no chance of getting this to work on your own, so it’s good that you are working with Collaborator X, although even with this collaboration, I don’t think you can do it. This is going to have to be one of those, “show me it is working in your hands” situations for me to consider it as a viable approach in your application.
LikeLike
March 12, 2011 at 12:45 pm
What whimple said.
Also, “you might be able to get this working but it is going to take 9 months on this alone. It isn’t the slam dunk you make out and therefore your experimental plans cannot possibly be accomplished in any reasonable timescale consistent with the overall proposal. Oh, and good luck getting your letter writer to actually help to the degree you are actually going to need it”
LikeLike
March 12, 2011 at 6:01 pm
Perhaps.
But it really isn’t that sort of technique. I checked on the more accurate wording and it was “even with collaborator’s help, I don’t think this will work/is possible”
When the wording was “the collaborator has this working and it is possible”
So there is an interpretable disconnect. Of course we can always interpret these things with the standard boilerplate takes, such as “this proposal has not yet been completed, and this is the only evidence I would accept” which runs into “this proposal IS completed, so why should it be funded.”
LikeLike
March 12, 2011 at 8:21 pm
Good point! Be sure to point out the “interpretable disconnect” in your resubmission. I’m sure they’ll appreciate that and give it a fundable score next time. *giggle*
LikeLike
March 12, 2011 at 11:49 pm
It’s fundable now. Why are you being such a “troll” as the kids say on the internet?
As a reviewer of papers I see other reviews and sometimes they are clearly and demonstrably wrong, and I wouldn’t fault the manuscript. Why is it simply not possible for a grant review to be less than optimal?
Many people read DM because they learn a ton about the process, the commiserate about some things. We’re all in the same boat, except there is also another boat: a boat called the “enjoys the pain of strangers” and Captain Whimple at the helm, lately promoted to insufferable pr……[this comment has been triaged]
LikeLike
March 13, 2011 at 12:31 am
I thought from what you wrote that you genuinely didn’t understand how such superficially contradictory comments in review of your application could be perfectly consistent with competent review. After this was explained to you, you seemingly went into the common denial exhibited by a large fraction of the 80% to 90% of applicants that don’t get a fundable score: the reviewers didn’t get it, or didn’t read it, or were logically inconsistent in their interpretation. Instead you might as well consider that as being your fault, because complaining about your perceived quality of the review is certain to be unproductive.
LikeLike
March 13, 2011 at 9:00 am
One thing that I have renounce to understand is the obsession of some scientist to make their personal opinions prevail against the judgment of the study section. You have nothing to win by refusing even to consider that they may have a point.
Everybody can make a bad review and, theoretically, even 3 bad reviews at the same time. But if they tell you that a new technique is not established and may work only in the hands of a recognized expert in the field, this sounds like a very reasonable comment. You either show that you have made it work in your lab (you must have done it before, actually), or you change the approach.
The system may be fair or not but, if you want to get ever funded, it is easier to adapt your proposal to the culture of he study section than to expect the study section to adapt your personal culture.
LikeLike
March 13, 2011 at 9:33 am
This is a good time to repeat the mantra that this kind of unpredictability of review is exactly why you need to have multiple grant applications in the system all the time, targeting different study sections for review. Seeing young investigators moaning and groaning on the Internet about the fate of “my R01 application” is heartbreaking, because it is so foolish. And anyone who complains that they “can’t write multiple R01s” because they lack the resources, ideas, preliminary data, whatever are simply not thinking creatively. The same exact preliminary data can form the basis for an infinite number of different grant applications.
If your career hinges on the review of a single grant application, you are doing it wrong. Even the most successful grant writers submit multiple competing applications (including resubmissions) for every one that gets funded. Over the years, I have averaged two competing NIH submissions (including resubmissions) per year, with only about 1/3 getting funded.
Grantsmanship is a stochastic process. Deny this fact at your peril.
LikeLike
March 13, 2011 at 12:59 pm
Whimple, you are arguing with the straw Pinko Punko and not the actual one. Yes, your experience with commenters allows you to generate a profile or type with which to address common themes. Those are not relevant here.
CPP, of course you are correct with your multiple grants in play at all times. Nothing I have said contradicts this wise statement.
Additionally, it is easily conceivable to recognize that reviews can be poor yet work within the framework of creativity to sway or bring the panel over to your side, whether in winning a majority or all of the reviewers even in the face of knee-jerk presumptions or generic poorly considered criticism from a minority of reviewers. I really don’t think any of this is controversial or deserving of abuse.
I’m on your side, dudes. When I look at colleagues grants, I ask myself what would CPP say, then I tell them what DM would say, because they likely won’t respond to being called fuckinge shitteass dumfukkes.
LikeLike
March 13, 2011 at 1:44 pm
I think the overall point here is that handwringing about the “quality” or “competence” of NIH peer review doesn’t help anyone get funded.
LikeLike
March 14, 2011 at 10:06 am
OK, so Eli reviews NASA and NSF grants, and not NIH, but the bullshit about multiple grants in the system which is what we have today especially among the soft money folk, is killing the science. Even with computers ( and yes, Eli is that old), putting together multiples in a way that they address different programs (study session) and don’t allow the program manager to say, my program doesn’t have to do this it belongs over there chews up huge amounts of time and effort.
God do we need birth control
LikeLike
March 14, 2011 at 12:54 pm
Eli,
During the early to mid 80s, the success rate for experienced applicants was around 40%. Seriously.
In my subfields, true, a lot of progress was made. However, it was made by a limited number of laboratories…even more sharply limited if you examine training pedigrees. And in retrospect there was one heck of a lot of street-light-problem going on. *Diversity* of approach and topic was stifled. To make it relatively specific, we learned a lot about the reinforcing properties of cocaine and heroin. Choked out? From the stand point of the diversity that exists post-doubling, a LOT.
A highly competitive environment does not exactly dis-favor the same-old, same-old. But at least there is a chance that panels (and POs) will be thinking “hey, we can only afford ONE of these, not five on the exact same topic”. And that will be good for the progress of science…
LikeLike