More NIH Grant criterion score fun, writ small

March 9, 2011

As we were just discussing on the Sb blog, the Approach, Significance and Innovation criterion scores are the biggest drivers of Overall Impact Score. Approach remains the king. Or, at least the Approach score correlates best with the Overall Impact score voted for some 32 thousand research grant applications that made it to discussion for the 2010 Fiscal Year.

A good friend of the blog submits the following outcome of a recent grant review. The grant was triaged, thus no discussion and no overall score. However each of three reviewers issued a putatively non-triage score (2-3) for one of the three big criteria. (As per our aforementioned discussion, the Investigator and Environment criterion scores were 2s or better. I told you they matter very little to the outcome!) As you might anticipate, the reviewers also each bagged on (4-6s) the other two remaining important criterion scores.

But here’s the funny part. The three reviewers each picked a different one of the Approach, Innovation and Significance criteria to laud.

So I should advise her to get Scarpa on the phone pronto to complain about the clearly erroneous review, right?

No Responses Yet to “More NIH Grant criterion score fun, writ small”

Comrade PhysioProf Says:

March 9, 2011 at 4:03 pm
If you are getting wildly different scores from the different reviewers on the same criteria, it is clearly a sign that the applicant doesn’t know how to properly write a fucken grant application.

LikeLike
drugmonkey Says:

March 9, 2011 at 4:25 pm
That’s another hypothesis, I suppose. You would prefer consistently bad scores as a sign of proper grantsmithing?

LikeLike
bethann Says:

March 9, 2011 at 4:44 pm
Not a chance. If scoring well in several domains, I agree that this is a writing problem not a reviewer problem and more importantly, it is likely a fixable problem 4-6’s can be fixed. Get someone evil to read and take apart the grant and resubmit thanking them profusely.

LikeLike
drugmonkey Says:

March 9, 2011 at 4:55 pm
Get someone evil

um…are you suggesting that I am “evil”?

LikeLike
Lorax Says:

March 9, 2011 at 5:39 pm
Ill add one more to your list. Today I received the scores on a triaged A1.

Reviewer 1
3 2 3 3 1
Reviewer 2
3 2 3 3 2
Reviewer 3
5 2 4 4 1

For those not in the know the scores in order are: Significance, Investigator, Innovation, Approach, Environment

First, reviewer #3 is a douchebag.
Second, investigator and environment all 1s and 2s as noted by DM.
Third, looking at these scores, this proposal seemed to be close to the discussion cut-off (again fuck you reviewer #3), which suggests the proposal has merit. Sadly, as an A1 I must rewrite it as a “new” proposal. Based on the recent guidelines, this means I must fundamentally change a proposal that to me looks to have a great deal of strength. Im sure NIH and my employer think that is the most effective use of my time.

LikeLike
drugmonkey Says:

March 9, 2011 at 5:52 pm
ouch. sorry to hear that Lorax. It’s always Reviewer #3…

LikeLike
Pinko Punko Says:

March 9, 2011 at 6:18 pm
This last round the triage rate was at least 60% I have been told. I await CPP telling Lorax that the grant was a “POS.”

Lorax- do you know what the historical spread of this study section is?

For example this set of scores is on the edge of funding for an ESI in a study section with which I am familiar:

2 1 2 3 2

2 3 3 3 3

4 3 3 5 4

Note that reviewer three is the usual shithead. And note the slap for the “environment”- enviroment I think can’t really help. Only the TOP places will get 1, and less than a three it is probably CPP takinge shittes on the redneckes.

However, if this study section only rated most grants 1-5, 3s-5s wouldn’t get discussed if 60% were being triaged.

LikeLike
Fucked Grant Hunter Well Fucked Says:

March 10, 2011 at 3:03 am
Fucken Scarpa is fucking busy hiring fucking upper managers to make fucking creative decisions fucking quickly !. He doesn’t fuck wasting his fucken time grabbing his fucking phone to hear fuckers fucking around with fucking complaints about fucking scores. What the fuck!.

LikeLike
Comrade PhysioProf Says:

March 10, 2011 at 3:29 am
Look, the bottom line is that your task is to get all three reviewers at least moderately on board with your grant to have a chance. If you’re routinely having problems with at least one reviewer, then you need to up your grantwriting chops. You also need to be submitting multiple R01s to a variety of different study sections.

LikeLike
qaz Says:

March 10, 2011 at 4:19 am
All these discussion of scores assumes that the average score is half-way between 1 and 9 (i.e. a 4.5). Although this is what it was supposed to be, in my experience with study section (both being on it and getting grants trashed by it), this is simply wrong. People are very bad at readjusting scales based on what’s in front of us. We want a canonical scale for us to work with. When I say that’s a 2, I mean that’s a 2 relative to all the grants I’ve seen. (*) What that means is that it is very possible in some study sections for a 3.5 to be triaged. (This varies greatly from study section to study section. Some study sections spread the scores out for the grants in question and triage cuts off at 5. Others don’t and triage starts at 3.0).

Also, in recent study sections I’ve seen, people are only encouraged to bring a grant out of triage if there is any remote possibility of it being funded. (This is not explicitly said, but I’ve heard grumbling of “what’s the point” in study sections in answer to the “anything else people want to discuss”.) Because grants are reviewed in order, by the time you reach the “pull it out of triage question”, everyone is exhausted and grumpy, and no one wants to spend another 20 minutes discussing a grant that is going to end up with a score of 4.0.

* Yes, DM, I know that’s not what it is supposed to be, but it’s a lot easier to find a score relative to the hundreds of grants that I’ve seen over the years than to find a score relative to the five other grants I’m actually reading for this study section.

LikeLike
drugmonkey Says:

March 10, 2011 at 5:40 am
I am definitely *not* a fan of the move to review in order of prelim scores, qaz. If it hasn’t been discussed by lunch break of Day 1, really, what’s the point? Right?

It has a suppressing effect on score movement. A cynical Monkey might conclude this is part of Scarpa’s long game to eliminate the face to face meeting.

LikeLike
Pinko Punko Says:

March 10, 2011 at 6:53 am
Reviewing in order is just terrible, especially if Reviewer 3 is an outlier and says a bunch of crap. No perfection of writing can save you from a certain kind of review.

LikeLike
drugmonkey Says:

March 10, 2011 at 7:08 am
I don’t see where “especially if Rev 3 is an outlier” has any special status. Grants where all three agree can move during discussion as well. It is not only the case that major movement occurs because two favorable reviewers “correct” the bad one. Sometimes all three talk each other into improvement. sometimes, albeit rarely, the panel revolts and says “we don’t get why your scores are so low after you all said such glowing things. We’re going outside the range”. It happens, I’ve seen it.

LikeLike
Dr Becca Says:

March 10, 2011 at 7:42 am
If you are getting wildly different scores from the different reviewers on the same criteria, it is clearly a sign that the applicant doesn’t know how to properly write a fucken grant application.

Is it out of the question that different members of a study section simply differ in their opinions, or in the standards they use to judge some of the criteria? I’m asking honestly, here.

LikeLike
drugmonkey Says:

March 10, 2011 at 7:49 am
Of course, Dr. Becca.

It is also the case that the written review is an exercise in post-hoc justification. The reviewer is trying to express for herself why the Gestalt impression was what it was. So they will tend to focus on what seemed to them to be the big baddie. Then, since you feel some obligation to talk about the good things (not least because if you just look like a total critical ass all the time, your ability to persuade people to your way of thinking goes down tremendously) you try to look for some part to like.

it is far from an exact science which is why outcomes like the one I’m discussing are not evidence of FLAWED REVIEW!!!!

LikeLike
Comrade PhysioProf Says:

March 10, 2011 at 7:53 am
Sometimes all three talk each other into improvement. sometimes, albeit rarely, the panel revolts and says “we don’t get why your scores are so low after you all said such glowing things. We’re going outside the range”. It happens, I’ve seen it.

What is even more common–and more effective–is for one of the more experienced members of the study section to say something like, “The discussion I’m hearing sounds a lot more like an X than a Y.” When used judiciously, this kind of rhetorical approach can be a lot more effective at moving the scoring than detailed argumentation about the substance of review.

I have used this technique a number of times to induce assigned reviewers to change their scores in the direction I favor.

LikeLike
drugmonkey Says:

March 10, 2011 at 7:57 am
…with that said, however, PP does have a bit of a point. If at least one person liked your proposal on a major criterion, then you can tentatively* assume that there is something good there. Your proposal is not total crap. Given that it is not crap and the other reviewer marked you down, you need to communicate better to a broader number of potential reviewers. So yes, “grantsmithing better” is a possible solution here.

*tentatively because you could just have a reviewer that doesn’t give a darn about “Innovation” and routinely gives top marks, for example.

LikeLike
drugmonkey Says:

March 10, 2011 at 8:03 am
is for one of the more experienced members of the study section to say something like, “The discussion I’m hearing sounds a lot more like an X than a Y.”

ahh, yes. The infamous score-calibration gambit. Definitely a favored tactic of other panel members not assigned to the app under discussion.

I guess that opinions would vary on whether this represents “fairness” or whether it is perpetuation of a sort of conservatism of a specific study section. I lean towards “fairness” myself. But I can see where an applicant would look at criterion scores against an apparently re-calibrated overall impact and go nutz.

LikeLike
becca Says:

March 10, 2011 at 8:08 am
What if instead of strictly reviewing in order every reviewer got one ‘early slot’ for the first grants discussed? That way you’ve got a shot at everyone being fresh for a grant you believe needs the discussion. Would that make any sense?

LikeLike
drugmonkey Says:

March 10, 2011 at 8:45 am
Practically difficult. Review order needs to be set early so POs know when to call in/ show up. Can be issues with phone reviewer schedules too.

Then, there are 20-30 reviewers. This would not be a minor effect, it would replace existing schema.

LikeLike
qaz Says:

March 10, 2011 at 6:09 pm
In one study section I was in years ago, the chair of the study section (I assume in consultation with the SRO, but I was just a kid, so what did I know) started the study section with two or three applications that scored well and two or three that scored poorly to set the stage. They picked applications where the reviews and scores were very consistent. We then went randomly through the grants after that. Because we’d had the first few grants to set our scale, we seemed to be pretty consistent. I thought it worked much better than the current system which seems to keep getting recalibrated every hour or so through the day as we work our way down. (In the recent study sections I’ve seen, hour 2 grants get pushed up because they’re obviously on the edge and people want to get them funded. Hour 5 grants get pushed down because we’re being told to spread the scores. Of course, by hour 5, it doesn’t matter anymore, so it’s a 5 instead of a4. And by hour 12, we’re all just like F%$^$% it, give it a 10.)

LikeLike
Pinko Punko Says:

March 10, 2011 at 6:35 pm
The argument in favor of order is that you have to start somewhere. As long as the “we’re all tired so scores won’t move towards the end of section” is minimized, I guess it is OK.

Giving everyone a slot doesn’t necessarily work because the distribution of grants to each reviewer is different. My colleague stated she had most of her pile discussed, while another panel member had two of her pile discussed.

What I would love to know is how many grants in particular sections really are obviously bad- meaning not just triage, but bad. Because grants, especially renewals can be excellent but there could be issues with productivity that really lower the assessment. But where does “productivity” go in a criteria score?

I like qaz’s calibration scheme.

It is much easier to see the strengths of the peer review when funding is 20-30, not 10-15. Then it sucks.

LikeLike
Neuro-conservative Says:

March 10, 2011 at 9:30 pm
qaz — I like that calibration approach and very much dislike the new system. On the other hand, I concede that the new system does promote fine-scaled tuning of scores amongst the few applications that have any chance of getting funded.

pinko — In my experience, relatively few grants are just bad, bad. I would be interested in hearing others’ experience. It would be nice if there were a lot more totally lame grants to fill out the bottom rungs. In reality, even many triaged grants would probably produce some good science.

LikeLike
drugmonkey Says:

March 11, 2011 at 1:01 am
Vanishingly few are bad, bad, IME as well. This is why closing just one of our several carrier groups to increase the NIH by 50% would be job well done.

LikeLike
Comrade PhysioProf Says:

March 11, 2011 at 1:35 am
My experience is that about one third are “bad, bad”.

LikeLike
drugmonkey Says:

March 11, 2011 at 2:04 am
A third have essentially zero chance of anything useful or productive being published and a good chance of resulting in stuff that is actually detrimental and/or wrong?

LikeLike
qaz Says:

March 11, 2011 at 3:50 am
I would estimate that about a third of the grants have flaws serious enough for me to honestly question whether they would work or not. But even in that bottom third, my guess is that more of them would end up producing good science out than not. (Because the best science is the stuff that we’re surprised at – not the stuff that we’re predicting, and the people proposing the work are careful scientists.)

In my experience, only a very few grants are so absolutely bad as to be truly horrible. (I remember one that actually got a 5.0 from all three reviewers independently on the old system.) But this is the problem with the new numbering system. They want to compress all of the basically flawed grants with all of the truly execrable grants. I needed that room on the old system so I could differentiate a 3.5 (which was unfundable and in fact likely triaged in the old system) from a true 5.0 (which was a waste of the photons carrying the light from the screens of our laptops to our eyes). I felt that the distributions seen in the old system were actually very correct – out of a 1-5 range, a few were in the 1.3-1.5, most were in the 1.8-2.8, ones that needed work were in the 2.8-3.5, and flawed grants were in the 3.5-4.0 range. The really disastrous ones were in the 4.0-5.0 range. Grants were clustered in the 1.8-2.8 range because that’s really where 75% of the grants belonged. I never understood the “spreading scores” problem in the old system – but then it had the resolution needed to differentiate the 1.3 to 2.0 range (which was where funding tended to live in the study sections I saw). In the new system, spreading scores is critical because it is quantized at such a low resolution. But spreading scores is hard because you’ve got to leave that 9 to send a message to the person who doesn’t even read what he/she wrote.

LikeLike
Altajax Says:

March 11, 2011 at 5:24 am
I just started to respond to reviewers comments and noticed something odd. Like many others I had one reviewer that trashed my proposal, but this was Reviewer #1. Weird! Anyway, I had a question on how best to address conflicting reviewer comments (the criterion score was similarly conflicting, so I thought it might be appropriate to bring up here).
So in my example reviewer one wrote under weaknesses, “Unfocused.” While reviewer #2 wrote, “Focused application with significant clinical relevance.” Huh? How do I address that? The reviewers literally wrote the EXACT opposite statements.
To provide a little more background this is an F32 proposal that scored a 30 (I think I can address the rest of the comments to bump the score up enough, but still). Both comments were in the Research Training Plan section.

Thanks

LikeLike
Comrade PhysioProf Says:

March 11, 2011 at 5:25 am
What, you are the one thatte gets to unilaterally define whatte is “bad, bad”? Fucke thatte noise, holmes.

LikeLike
Comrade PhysioProf Says:

March 11, 2011 at 5:32 am
Those are not “conflicting”. Reviewer #1 felt it was not focused enough, while reviewer #2 felt it was sufficiently focused. Conflicting would be if reviewer #1 felt it was not focused enough, while reviewer #2 felt it was *too* focused.

There is huge difference in how to deal with those scenarios in a resubmission. In the first case, you just do something with the goal of convincing reviewer #1 that you have increased the focus. Reviewer #2 should be presumed to remain satisfied with the level of focus, even if you increase it (within reason).

LikeLike
Anton Says:

April 3, 2011 at 5:50 pm
Interesting arguments, but ultimately futile. In grant funding, ‘close’ means: nothing, zero, nada.

In my experience, most NIH grants now exhibit far superior grantsmanship to even the top 20% in the past. Hence, grantsmanship in necessary but not sufficient.

To get funded (NOT get a ‘good score’), you need two (or better three) extremely good ‘friends’ (or people who fear you) on the study section who are, in essence, willing to take a bullet for you. Without that, don’t bother submitting.

LikeLike
Len Says:

December 29, 2011 at 12:42 pm
everyone is right – review system does not work, and focus has become on how to deal with it, rather than the science. That is why limitation to a single resubmission is a disaster. It is also the antithesis of science – “we know, therefore, we do not want to hear your arguments…..”. Used to be (long ago, in a galaxy far, far away), the mandate to the study section (in writing!) was to promote research in the area(s) for which it was responsible. Now, its how do we discourage what we think is not good enough.

LikeLike

	Anonymous on Research Opportunities for New…
	Research Opportuniti… on A window on what is fair
	Research Opportuniti… on On targeting NIH funding oppor…
	Research Opportuniti… on NINDS Issues NOSI Requesting A…
	Links 6/7/22 \| Mike… on Grant awards and the new, new…
	Grant awards and the… on NIH tries, again, to keep gran…
	Grant awards and the… on Fighting with the New Biosketc…

DrugMonkey