NIH Peer Review Advisory Committee Meeting Dec 2007

December 5, 2007

The PRAC met 12/3 and their site has the slide files up already. I have some brief observations.

The powerpoint from L. Tabak gives ~2600 as the number of responses to the RFI. It specifies the responses are still being evaluated but does list a bunch of emerging themes. Quickly scanning I see many issues that I’ve been hearing raised and nothing shockingly new.

Scarpa’s CSR update holds no surprises. The usual bogus analysis of New/Experienced Investigator success, gated on 20th percentile instead of the more-important issue of scoring within the fundable range. More commitment to speeding review so as to permit 3 reviews in one year. Hard to tell from the one figure but it looks like the pilot suggests that people are perfectly happy to take advantage of revise-next-round if permitted. About 3,000 standing reviewers in 2006 and 2007, ad hocs about 14,500 in 2006, 12,500 in 2007. (All this and they got about 2,600 comments?) Academic rank data! Yea. Except he collapses standing, ad hoc and SEP so this doesn’t really give you the relevant number of applications reviewed by each tenure rank. Ten percent Assist Prof by bodies doubtlessly overestimates the number of grants reviewed by Asst Profs. More on the electronic review and shortened app strategies which I think are a BadIdea.

The “CSR Best Practices” from Kitt is pretty interesting, I’ll let this one percolate for awhile.

The “Clustering” one by Schneider may have me resorting to the video-cast to understand it. It seems like this is, in part, trying to address the bunny-hopping issue raised by whimple. [Update 12/06/07: Agent whimple is all over it, see comment for analysis]

Update2 12/06/07: So I listened to the Schneider/Clustering presentation and discussion as well.

Listening to the advising process during the discussion is…frustrating. A lot of bureaucratic behavior, “need additional study”, “are these trends statistically significant”, “I can’t understand without the specific grant types”, etc. They spend a lot of time on the Clinical versus Basic issue which isn’t on my radar as much but some of you might be interested. One discussant fingers the GoodOldBoys/Girls issue (around 2:57), Schneider says “I see the same trends” followed by general laughter and….they. just. move. on. Later someone else sort of makes an aside that he’s in favor of de-OldBoy-ing study sections (3:04) and someone else quips (3:09) that “some study sections are not only eating their young but clubbing them to death”. Doesn’t quite work logically as a metaphor but the gist comes across. There was also a mention from Schneider of proposals to review NI grants in focused study sections and a mention from someone else about how NIH pickups were saving the NIs from the study sections. What I didn’t hear was a connection of the freakin’ dots.

Other tidbits of interest include Schneider shutting down one of the skeptics by pointing out that it was not merely an issue of the popularity of a research domain as some very popular domains were unclustered (3:01). Also Christine Melchior of IFCN arguing that “captive study sections” may account for some of the apparent IFCN “issue”. “Captive” meaning, I take it, that most if not all of a given study section’s apps are under consideration by a single Institute or Center. Oh, and the fact that Schneider found it a “surprise” that NI cluster trends were reversed (3:03-04) with less-clustered = better score is, well, surprising. Rhetorical trick? Or another example of just how frickin’ clueless these people are?

16 Responses to “NIH Peer Review Advisory Committee Meeting Dec 2007”

  1. physioprof Says:

    “The ‘CSR Best Practices’ from Kitt is pretty interesting, I’ll let this one percolate for awhile.”

    WTF are Parking Lot Issues? And what’s up with that lavish office? That’s a couple friggin’ R01s right there.

    Like

  2. drugmonkey Says:

    i thought it was “we’re not dealing with this stuff yet, move it to the back lot”. isn’t that scarpa at his desk? some kinda open office or doorless office mumblety? you know these management types…

    Like

  3. physioprof Says:

    “The ‘Clustering’ one by Schneider may have me resorting to the video-cast to understand it. It seems like this is, in part, trying to address the bunny-hopping issue raised by whimple.”

    Yeah. I got two things from the PPT:

    (1) “Extreme clustering essentially establishes an entitlement and is counter to broad study sections”

    Yeah, that’s what I was saying in the bunny discussion.

    (2) If I am understanding these graphs correctly, New Investigators totally suck ass. Like in CSR, 90% of new investigator applications get scored worse than the 60th percentile (Slide #20). Can this be right? Or am I not understanding what these graphs are about?

    Like

  4. whimple Says:

    Hard to know what the clustering thing is about. It’s also hard to know how to have a “broad study section” if only 2.5 people actually read any given grant. I couldn’t figure out the graphs satisfactorily either. It wouldn’t surprise me though if the New Investigators did in fact suck ass. I wonder how they averaged in the triaged grants. If this is just the scored grants, it’s even worse.

    Like


  5. […] the previously posted graying of NIH PIs PPT plus some new charts & data points. And as DM discusses separately, the PRAC (peer review advisory committee) Dec 3 presentations are now […]

    Like

  6. whimple Says:

    Ok. I went to the videotape on this one. The discussion of “clustering” starts at 2:35:56

    “clustering” is bigger scale the bunny-hopping we’ve been having fun with. clustering means having applications in roughly the same field of study being either reviewed together in the same IRG, possibly even the same study section, vs having these applications spread over several IRGs (and correspondingly even more study sections).

    The example of something that is necessarily clustered poorly for review is genetics of human behavior, since it isn’t just genetics, and it isn’t just human behavior, these applications are going to get divided up for review.

    The graphs in the powerpoint presentation show actual percentile scores vs the percentile of applications. In a fair universe, this should be a straight line with a slope of 1. In other words, 15% of all applications should score at or better than the 15th percentile, 40% of applications should score at or better than the 40th percentile etc.

    In general, applications in well-clustered groups do slightly better than they should, for example, 50% of applications do better than the 40th percentile (on average). Applications in poorly clustered review areas have much more scatter. In some cases they do very much better. There’s one example of a group where 60% of their applications do better than the 30th percentile. These guys are looking out for each other pretty good! But there are also examples in poorly clustered areas where these grants are doing badly, for example, only 20% of applications score better than the 50th percentile.

    The New Investigator data is interesting. Overall, NI’s do slightly worse. 30% of NI’s score better than the 40th percentile. However, in some areas of review, NI’s get hammered very very badly. The presentation listed 3 anonymized “subsets” of intermediate size. (Intermediate size means 200 applications per review cycle whereas something like cancer has 2000 applications per cycle.)

    In all three of the subsets examined “A”, “B” and “C”, the NI’s do better under conditions of low clustering. That is, a lonely NI grant does better than a NI grant up against established investigators in the same field. I guess this is not a big surprise.

    Lets set the imaginary payline at 20% for NI’s.
    In subset “A”, less than 5% of NI’s meet the payline, regardless of clustering.
    In subset “B”, same thing, less than 5% of NI’s meet a 20% payline.
    In subset “C”, if the NI is in a low clustered review group, 10% of them meet the 20% payline. Still twice as bad as it should be, but better. In this subgroup, NI’s in high cluster review do egregiously badly: about 3% of these NI’s make the 20% payline.

    Coincidentally, all three of these sample subsets “A”, “B” and “C”, have high clustering in IFCN (Integrative, Functional and Cognitive Neuroscience). Boy are the NI’s in this IRG ever getting screwed to the wall. 🙂

    Like

  7. drugmonkey Says:

    “In this subgroup, NI’s in high cluster review do egregiously badly: about 3% of these NI’s make the 20% payline.”

    Ummm….damn! PP does this answer your prior musings as to why NI apps should get picked up “way” down to 25%ile in some ICs?

    Like

  8. lvnWiFi Says:

    “cluster f*ck” more like…

    Like

  9. physioprof Says:

    “PP does this answer your prior musings as to why NI apps should get picked up ‘way’ down to 25%ile in some ICs?”

    Depends what you mean by “why”. If the goal is to fund both the top 10% of NIs and the top 10% of experienced investigators, then the pick-ups make sense.

    If the goal is to fund the top 10% of applications, then it may or may not make sense, depending on what you think the reason is for the poor relative performance of NIs.

    If the reason is that NIs are subject to intrinsic negative bias in peer review, then it also makes sense. If the reason is that NIs submit shitty applications, then not so much.

    That is interesting that NIs do so poorly in IFCN. I submitted my first two R01 applications ever for review in an IFCN study section. The first one got triaged and I never resubmitted; the second finally got funded as an A2. The first R01 I ever got funded was as an A0, which was reviewed by a CSR standing special emphasis panel.

    So, I submitted three different R01s as a NI, the ones that went to IFCN got savaged. The one that was reviewed by the standing SEP was embraced.

    Like

  10. drugmonkey Says:

    “If the reason is that NIs are subject to intrinsic negative bias in peer review, then it also makes sense. If the reason is that NIs submit shitty applications, then not so much.”

    one of the scariest yet confirmatory parts of the videocast came around 1:21 when Scarpa is fielding a question about the asst-profs-are-just-bad-reviewers theme. basically throwing up the hands and saying “well we could never determine this”. WTF? or maybe JFCoaPS! just shows they have no real intention of drilling down to truth on any of these issues. So can you imagine trying to ask this question (which is for damn sure in the back of everyone’s mind)?

    for the record, my position on this is that IME yes, NI apps are on average worse than experienced Investigator apps. However, I rant on about that population of apps that I see from NI that are excellent and are still scoring worse than lesser proposals from more-established folks. This is when I start on about “bias”, I am not trying to suggest that all NI apps deserve funding.

    “the ones that went to IFCN got savaged. The one that was reviewed by the standing SEP was embraced.”

    wanna hazard any estimate of how “clustered” either you or your applications were?

    Like

  11. physioprof Says:

    The ones that went to IFCN (both to the same study section) are highly clustered, based on my understanding of clustering to be apps in the same subfield all being reviewed in the same study section. In fact, I would say that this particular study section may be one of the most highly clustered in IFCN (just based on my intuitive perception of the narrowness of the mandate of that study section).

    The application that was reviewed by the standing SEP was purposefully designed to be declustered, as I had by that point realized the “holding pattern” and clique shit that goes on in clustered study sections. And the standing SEP that it was reviewed by is very declustered: its mandate is extremely broad, and even though it is a standing SEP, it does not have any official standing membership. It is reconstituted de novo each cycle, with appropriate expertise for the particular gemisch of applications then under review.

    Incidentally, this standing SEP is slated to become a standing study section as part of the new neuroscience IRG.

    Like

  12. whimple Says:

    It doesn’t matter if the NI grants all suck. Unless the NIH wants to contract its researcher base, the NI grants have to get funded regardless.

    Like

  13. physioprof Says:

    If this is true, then instead of doing “pick-ups”, they should review and percentile NI applications separately. Actually, they probably don’t even have to review them separately; just percentile them separately.

    Like

  14. whimple Says:

    In the face of tremendous variability in the degree of study section hostility to new investigators, simply percentiling these applications separately isn’t going to be very helpful. One of the suggestions at the front of the PRAC session was to bring back the R29, but with some real meat on the bones this time. That might work, especially if they rename it “R01”.

    What baffles me is why, with this kind of data available, does CSR allow the SRO’s to let their study sections get away with this kind of chicanery.

    Another example: Using CRISP it’s easy to analyze the entire NIH, study section by study section, to determine what fraction of grants they fund per year are new apps (1R01) vs renewed apps (2R01). The NIH aggregate is 2 new grants funded to every renewed grant funded. There are, however, some study sections (like the one my last R01 went to) that are completely upside-down on this ratio and fund 2 renewed grants for every new grant. I’d like to see the aberrant study sections justify this behavior, if in fact it can be justified, since this looks suspiciously like the more-of-the-same fund-my-friends behavior that gets in the way of objective review.

    If I had to guess, I’d say biased review is worse than no review at all, so having program ignore the priority scores and pick up apps that look good to them might not be such a bad interim policy.

    Like

  15. drugmonkey Says:

    whimple, I completely agree with the total bafflement thing. I see so many cases where even the limited data available make a strong statement and we know for sure that NIH either does or could easily whip up the additional necessary data.

    The R29/checkbox thing is one of my hobby horses. they’ve been after the same thing “help new investigators” for decades and none of these things has helped. why can’t they see that the problem is with normal human behavior at the study section level?

    “why, with this kind of data available, does CSR allow the SRO’s to let their study sections get away with this kind of chicanery”

    I see this weird thing where they try to keep things on the up and up but then back off and say “but we can’t interfere with independent review”. So during a meeting SROs may be free to remind the panel of the R21 rules, for example. they will make sure panels discuss a certain fraction of NI, or particular mechanism grants to make sure the triage burden isn’t disproportionate.

    trouble is, this is all window dressing. if the prelim scores come in at a triage level and then the panel is forced to discuss it, well, that app ain’t going to move to fundable, is it? tell someone not to focus on the lack of prelim data and they’ll just vote how they were going to vote anyway or start talking about how the hypotheses weren’t “supported” , as if it were a lack of background.

    With respect to Type2 continuation apps, the one legitimate reason is that subfields can have different views on the merits of “renewing a grant”. For some fields, departments, individual PIs, it is a ReallyBigDeal whether you have renewed a grant or not. this can be an explicit criterion of promotion, btw. other situations, it may be totally irrelevant whether you’ve renewed a particular grant or merely been continuously funded.

    with that said, we can discuss whether it should be important to continue a project vs. just keep getting new awards.

    Me, I have come to the conclusion that perhaps one very important NIH-wide “fix” is to ditch competing continuations entirely. This would help with the investigator-based vs. project-based funding duality/tension/bullshit.

    Like


  16. […] We’re using “bunny hopping” thanks to whimple and the NIH CSR calls this “clustering“. Note upfront that this analysis and discussion does not necessarily require overt malicious […]

    Like


Leave a comment