What the NHLBI paper metrics data mean for NIH grant review

February 21, 2014

In reflecting on the profound lack of association of grant percentile rank with the citations and quantity of the resulting papers, I am struck that it reinforces a point made by YHN about grant review.

I have never been a huge fan of the Approach criterion. Or, more accurately, how it is reviewed in practice. Review of the specific research plan can bog down in many areas. A review is often derailed off into critique of the applicant’s failure to appropriately consider all the alternatives, to engage in disagreement over the prediction of what can only be resolved empirically, to endless ticky-tack kvetching over buffer concentrations, to a desire for exacting specification of each and every control….. I am skeptical. I am skeptical that identifying these things plays any real role in the resulting science. First, because much of the criticism over the specifics of the approach vanish when you consider that the PI is a highly trained scientist who will work out the real science during the conduct of same. Like we all do. For anticipated and unanticipated problems that arise. Second, because there is much of this Approach review that is rightfully the domain of the peer review of scientific manuscripts.

I am particularly unimpressed by the shared delusion that the grant revision process by which the PI “responds appropriately” to the concerns of three reviewers alters the resulting science in a specific way either. Because of the above factors and because the grant is not a contract. The PI can feel free to change her application to meet reviewer comments and then, if funded, go on to do the science exactly how she proposed in the first place. Or, more likely, do the science as dictated by everything that occurs in the field in the years after the original study section critique was offered.

The Approach criterion score is the one that is most correlated with the eventual voted priority score, as we’ve seen in data offered up by the NIH in the past.

I would argue that a lot of the Approach criticism that I don’t like is an attempt to predict the future of the papers. To predict the impact and to predict the relative productivity. Criticism of the Approach often sounds to me like “This won’t be publishable unless they do X…..” or “this won’t be interpretable, unless they do Y instead….” or “nobody will cite this crap result unless they do this instead of that“.

It is a version of the deep motivator of review behavior. An unstated (or sometimes explicit) fear that the project described in the grant will fail, if the PI does not write different things in the application. The presumption is that if the PI does (or did) write the application a little bit differently in terms of the specific experiments and conditions, that all would be well.

So this also says that when Approach is given a congratulatory review, the panel members are predicting that the resulting papers will be of high impact…and plentiful.

The NHLBI data say this is utter nonsense.

Peer review of NIH grants is not good at predicting, within the historical fundable zone of about the top 35% of applications, the productivity and citation impact of the resulting science.

What the NHLBI data cannot address is a more subtle question. The peer review process decides which specific proposals get funded. Which subtopic domains, in what quantity, with which models and approaches… and there is no good way to assess the relative wisdom of this. For example, a grant on heroin may produce the same number of papers and citations as a grant on cocaine. A given program on cocaine using mouse models may produce approximately the same bibliometric outcome as one using humans. Yet the real world functional impact may be very different.

I don’t know how we could determine the “correct” balance but I think we can introspect that peer review can predict topic domain and the research models a lot better than it can predict citations and paper count. In my experience when a grant is on cocaine, the PI tends to spend most of her effort on cocaine, not heroin. When the grant is for human fMRI imaging, it is rare the PI pulls a switcheroo and works on fruit flies. These general research domain issues are a lot more predictable outcome than the impact of the resulting papers, in my estimation.

This leads to the inevitable conclusion that grant peer review should focus on the things that it can affect and not on the things that it cannot. Significance. Aka, “The Big Picture”. Peer review should wrestle over the relative merits of the overall topic domain, the research models and the general space of the experiments. It should de-emphasize the nitpicking of the experimental plan.

16 Responses to “What the NHLBI paper metrics data mean for NIH grant review”

  1. BugDoc Says:

    If grant review should focus on general topic domain or model system, rather than some notion of reviewing the merit of approach (per the NHLBI data), then why have “peer” review at all by active researchers? Why not just have program review? The program officers, who are scientists AND have some directive for what their portfolio should be, can be the agents of review. Based on the data, it’s not clear to me that would be any worse than the current system. If we are not going to nitpick the little details of approach (which I agree is silly in most cases), then there doesn’t seem to be a need for active researchers. An expanded staff of program officers/professional reviewers can perform grant review. They read papers and attend conferences just like the rest of us, and can assess number of papers and citations just as easily.


  2. AcademicLurker Says:

    So to review:

    A) As the “Do you always get similar scores on your RO1s?” thread demonstrated, scores are all over the place to such an extent that they’re nearly random (assuming a credible proposal to begin with).

    B) Above a certain, very generous, cutoff priority scores don’t correlate with outcomes at all.

    C) PIs spend an increasingly large fraction of their time on grantsmithing.

    The situation seems to be…suboptimal.


  3. sop scientist Says:

    Timely post as I am currently writing reviews for study section……. I’ve always hated nit-picky comments that seem to be just searching for tiny errors to try to differentiate proposals that are all extremely excellent in the top percentiles.


  4. drugmonkey Says:

    BugDoc- this is a bad month to ask me to put more decision power in the hands of POs…. 😦


  5. professa Says:

    One value of the Approach section and the discussion of the alternatives is to hear from PIs about conceptual alternatives. I think everyone agrees that using a different epitope tag or cell line or sequencing platform is pretty irrelevant. *However* hearing what they have considered as alternative hypotheses, and if those are testable if their original hypothesis crashes and burns, can be very helpful in ranking wheat vs. chaff.
    Assessment of unneeded details has gotten better in the era of shorter grants, but I would guess the same could be achieved in a 6-8 page proposal.
    Another issue to deal with is whether bibliometrics are really the most important outcome to consider. I realize it is easy to measure, but we don’t want to be searching under the streetlight because that is where the light is. (OTOH, I don’t have a quick and ready alternative outcome.)
    I would bet that the bibliometrics of applicant PIs just *prior* to funding is strongly correlated with percentile ranking. PIs that are considered “hot” and just published in CNS are likely to earn funding (especially at the trainee/K level)–but maybe our ability to pick long-term winners is not so impressive. More proof of randomness of science??


  6. Ola Says:

    I disagree completely. In my field there are lots of different ways to do things, and there is widespread disagreement about which way is “best”. Nevertheless, reviewers are generally in agreement about certain methods just not being good enough.

    Things such as using a cell line instead of primaries, using a whole body knockout instead of inducible or tissue specific. Using an antibody to detect of post-translational modification when mass spec’ is so much better. Having aims that claim to test the “role” of X in Y, but then experimental designs that merely correlate X and Y without proving cause/effect.

    So, what you refer to as “nit-picking” often boils down (at my study section) to the approach completely tanking a proposal because the choice of methods is just so fundamentally flawed. Most of the approach flaws that get flagged in our reviews are often heavily overlapped with significance – if you’re using the wrong method then the result won’t be informative about the human disease. Does the point belong in the significance or the approach column? Probably more in the former, but people tend to load-up the approach anyways.


  7. BugDoc Says:

    @Drugmnky: I hear you! But based on the data, I’m not sure it would be any better/worse if POs vs our peers were reviewing. I know for sure it would save us a lot of time in grant review though. I’d rather review papers. I know I don’t have to do grant review service, but I do it since I appreciate that others have to review my grants.

    @Ola: The fact that there is widespread disagreement about which way is “best” would seem to indicate that peer review is going to be problematic in the first place. If reviewers are generally in agreement about certain methods just not being good enough, the program officers (who have done their Ph.D., postdoc and in some cases have run their own labs for a while) likely also can identify the bottom 2/3 of the pile that are proposing sub-optimal approaches. I’m not saying they will be perfect either, I’m just pointing out that if peer review of grants isn’t doing what we think it does, than let’s not let our wounded egos get in the way of getting science done. Which I think would be more likely if we spent less time bickering over grants in study section and more time doing science.


  8. Ass(isstant) Prof Says:


    I’ll post some gems of comments I’ve received that might be construed as ‘nit picking’ to spread the scores. I can certainly see that substantive insight can come out of Approach comments, though it also looks like an easy way to tank a given proposal when you favor another. Some comments under Weaknesses have been useful and helped me think through alternatives, so I’ll give you that.

    Approach is where I see ‘stockcritiques’ used in the reviews I get, so I understand what drug monkey is after on this post. It could explain why percentiles don’t predict impact if Approach has the greatest impact on percentile.

    A few examples: “There are other members of this gene family, so knockout/knockdown probably won’t indicate anything because they are likely redundant.” –the knockout is, of course embryonic lethal and a response to knockdown in cultured cells was shown.

    ‘You can’t pick up transcription factors by mass spec.’ –of course we can, as shown in our previous pubs and preliminary data.

    ‘You’ve never published anything with an immunoprecipitation to detect ubiquitin mods, I’m not sure you can do it.’ –are you f-ing kidding me? See novel methods developed, other immunodetection techniques, in vivo translation. It’s a standard method that undergrads are successful at with dubious amounts of instruction. This also says nothing about whether or not the approach is sound.

    I can’t remember the specifics, but I had one question of where the ‘healthy control’ group would be for a pilot analysis of liver biopsy specimens. I wanted to ask the reviewer if he/she would like to volunteer for that biopsy.
    A comparison group was identified, but I indicated that they weren’t likely perfectly healthy.

    Liked by 1 person

  9. Joe Says:

    I don’t think the PO’s can do the review job. My PO is great, but his portfolio contains some very diverse projects. There is no way he could keep up with all the fields involved. Also, people in the field know who is really making the advancements and who is just spinning BS. So I think we need the expert reviewers to torpedo the applications that are using a clearly wrong approach.


  10. Grumble Says:

    DM – you are absolutely right. “Approach” is, hands down, the more irrelevant part of the application because almost invariably, the PI doesn’t do the experiments exactly has proposed. So, any criticism of that section is, by definition, misguided before it is even uttered. Therefore, it makes much more sense to judge the grant based on the other criteria – significance, innovation, environment, and PI.

    In light of the NHLBI metrics, what the NIH should do is limit the Approach section to 2 pages, but allow 4 pages each for Significance and Innovation.

    To answer Ola’s criticism that sometimes PIs just choose the wrong methods – that might be true, but you would be able to tell if the methods are grossly wrong from a 2 page approach section. For instance, reviewers could still complain that a grant proposes to use antibodies when mass spec would be better, but would have no basis on which to complain that the exact mass spec methods aren’t good enough. (Unless, of course, the PI has a history of publishing crappy mass spec papers, in which case that would be a valid basis for giving a low score.)


  11. BugDoc Says:

    “My PO is great, but his portfolio contains some very diverse projects. There is no way he could keep up with all the fields involved.”

    Some of the “experts” I have reviewed with at study section clearly haven’t kept up with all the fields involved in the grants they are reviewing either. In any case, as I think someone commented in a previous DM post, the people who are the real experts in your field are the ones that are most likely to nitpick your Approach to death.


  12. The Other Dave Says:

    This is a great post. But it sort of swings us back to the question of whether we should fund ‘people, not projects’. Which of course would be done based on PI track record/fame. Which might unfairly favor established investigators. We need a balance that ignores nit picky crap, but also leaves room for people with a great new idea & approach.

    Maybe we should get away from the separate review criteria again, go back to a single review score, which would be based on whatever the reviewer values. I forget: What was the evidence that was bad?


  13. Pinko Punko Says:

    I sort of understand why you are saying here, but I don’t really get what it could possibly mean. The funding levels mean the system is broken. It means there is a surplus of good science. If my grant goes down to stock critiques and lazy review or if it goes down to a nit pick critique, the latter is more beneficial. At some point part of showing that I am smart and should be a successful scientist will include responding appropriately to a nitpick critique and demonstrating sophistication about how I think about a project. Meaning I have something to play off of in a response. Eventually, someone will have to champion the grant anyway, and that is just more ammunition for championing it. If reviewers act like they need to fight for one grant in their pile instead of fairly engaging with all proposals, the result is likely a lot of lazy reviews, or at least reviews that don’t have anything to go off of. Somebody engaging with the grant even if they are micromanaging it, is so much better.

    If Approach is to have no meaning, and as Grumble tends to propose, that the entire process is a complete pantomime, what takes its place? It just seems silly to go down that road- what are the special snowflakes going to do to obtain their funding, just be special? Is it all Significance?

    How were the NHLBI data normalized to field?

    The Other Dave, the grants just get a single score in voting. The breakdown just gives you an idea of what the driver is for the overall score, as I suspect you know. Approach is the main driver, Significance next, and then Investigator or maybe Innovation. They all probably have some aspect of figure skating scoring in that they flock together. Someone is not productive, they can get hammered on Investigator, and that drives the overall score. Grant isn’t great, then it will be hammered on Approach, yet the Investigator can get Pyrrhic victory of 1 for Investigator. Those scores just help you read the tea leaves. If pay lines were 25th percentile, I think study sections would be doing a better job. They are forced into worse and worse decisions based on thinner and thinner margins. 60% of grants are not discussed. If the pay line is 25%, maybe 30% would not be discussed. Meeting in person is much more fair to the grants that actually get discussed, assuming the panel is a good panel with strong chair/SRO. I think there are probably some shit panels out there.


  14. drugmonkey Says:

    I think it would be an error to conclude an entire panel is shitty unless you have done a round or two of review on it. All the panels I have been on are similar and not “shitty”. Of course this may simple reflect the super-cluster of highly related subfields and the interlocking population of reviewers/applicants. Maybe we are lucky.


  15. Grumble Says:

    “If Approach is to have no meaning, and as Grumble tends to propose, that the entire process is a complete pantomime, what takes its place? It just seems silly to go down that road- what are the special snowflakes going to do to obtain their funding, just be special? Is it all Significance?”

    I didn’t say Approach should have “no meaning”. It should have less meaning as a basis for criticizing a grant.

    What should take its place? Past performance predicts future results. If reviewers are happy with the impact a PI has made on the field, then they should award the grant, unless the Approach is clearly wrong or the Environment consists of a barren field. This also means that someone (reviewers and/or program staff) need to limit the amount of funding an individual PI gets: you can’t submit a grant every cycle for a year saying, “look at my Nature paper last year!” and end up with 3 R01s just for asking. Let’s call this Funding System A.

    Of course, DM will jump in to complain “what about the newbies?”, and that’s a legitimate question. Newbies should be funded through a separate system in which Approach is more important. That’s Funding System B. And established PIs who need more than the baseline amount provided through System A should have access to it through yet another system (C), in which they must provide full Approach details. The bar for this would be very high; Approach would have to be impeccable and the Significance and/or Innovation would have to be extremely good.


  16. Pinko Punko Says:

    Part of asking for Approach to be thought out and reasonable is to attempt some level of efficiency. Spending a ton of money can lead to significant results in absence of any checks and balances. I think that in the current climate there is no workable system. I think that with increased funding the system is really a reasonable mix of A,B.C. Right now it is moving towards C, just as a competitive University goes to the 15th tie-breaker for admissions. Certainly the dark side of A is whether everybody just decides that their buddies are more equal than nobodies. Just A makes me very wary.


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s

%d bloggers like this: