CV alt metrics and Glamour

November 19, 2015

Putting citation counts for each paper on the academic CV would go a long way towards dismantling the Glamour Mag delusion and reorient scientists toward doing great science rather than the “get” of Glam acceptance.

Michael S. Lauer, M.D., and Richard Nakamura, Ph.D. have a Perspective piece in the NEJM which is about “Reviewing Peer Review at the NIH”. The motivation is captured at the end of the first paragraph:

Since review scores are seen as the proximate cause of a research project’s failure to obtain support, peer review has come under increasing criticism for its purported weakness in prioritizing the research that will have the most impact.

The first half or more of the Perspective details how difficult it is to even define impact, how nearly impossible it is to predict in advance and ends up with a very true observation “There is a robust literature showing that expert opinion often fails to predict the future.” So why proceed? Well, because

On the other hand, expert opinion of past and current performance has been shown to be a robust measure; thus, peer review may be more helpful when used to assess investigators’ track records and renewal grants, as is typically done for research funded by the Howard Hughes Medical Institute and the NIH intramural program.

This is laughably illogical when it comes to NIH grant awards. What really predicts future performance and scientific productivity is who manages to land the grant award. The money itself facilitates the productivity. And no, they have never ever done this test I guarantee you. When have they ever handed a whole pile of grant cash to a sufficient sample of the dubiously-accomplished (but otherwise reasonably qualified) and removed most funding from a fabulously productive (and previously generously-funded) sample and looked at the outcome?

But I digress. The main point comes later when the pair of NIH honchos are pondering how to, well, review the peer review at the NIH. They propose reporting broader score statistics, blinding review*, scoring renewals and new applications in separate panels and correlating scores with later outcome measures.

Notice what is missing? The very basic stuff of experimental design in many areas of research that deal with human judgment and decision making.



Here is my proposal for Drs. Lauer and Nakamura. Find out first if there is any problem with the reliability of review for proposals. Take an allocation of grants for a given study section and convene a parallel section with approximately the same sorts of folks. Or get really creative and split the original panels in half and fill in the rest with ad hocs. Whenever there is a SEP convened, put two or more of them together. Find out the degree to which the same grants get fundable scores.

That’s just the start. After that, start convening parallel study sections to, again, review the exact same pile of grants except this time change the composition to see how reviewer characteristics may affect outcome. Make women-heavy panels, URM-heavy panels, panels dominated by the smaller University affiliations and/or less-active research programs. etc.

This would be a great chance to pit the review methods against each other too. They should review an identical pile of proposals in traditional face-to-face meetings versus phone-conference versus that horrible web-forum thing.

Use this strategy to see how each and every aspect of the way NIH reviews grants now might contribute to similar or disparate scores.

This is how you “review peer review” gentlemen. There is no point in asking if peer review predicts X, Y or Z outcome for a given grant when funded if it cannot even predict itself in terms of what will get funded.

*And by the way, when testing out peer review, make sure to evaluate the blinding. You have to ask the reviewers to say who they think the PIs are, their level of confidence, etc. And you have to actually analyze the results intelligently. It is not enough to say “they missed most of the time” if either the erroneous or correct guesses are not randomly distributed.

Additional Reading: Predicting the future

In case you missed it, the Lauer version of Rock Talk is called Open Mike.

Reviewing Peer Review at the NIH
Michael S. Lauer, M.D., and Richard Nakamura, Ph.D.
N Engl J Med 2015; 373:1893-1895November 12, 2015
DOI: 10.1056/NEJMp1507427