Expertise versus consistency
November 24, 2014
In NIH grant review the standing study section approach to peer review sacrifices specific expertise for the sake of consistency of review.
When each person has 10 R01s to review, the odds are that he or she is not the most specifically qualified person for all 10 are high.
The process often brings in additional panel members to help cover scientific domains on a per-meeting basis but this is only partially effective.
The Special Emphasis Panel can improve on this but mostly it does so because the scope of the applications under review is narrower. Typically the members of an SEP still have to stretch a bit to review some of the assignments.
Specific expertise sounds good but can come at the cost of consistency. Score calibration is a big deal. You should have seen the look of horror on my face at dinner following my first study section when some guy said “I thought I was giving it a really good a
score…you guys are telling me that wasn’t fundable?”
Imagine a study section with a normal sized load of apps in which each reviewer completes only one or two reviews. The expertise would be highly customized on each proposal but there might be less consistency and calibration across applications.
What say you, Dear Reader? How would you prefer to have your grants reviewed?
November 24, 2014 at 11:52 am
I’d prefer to write a 2 page grant, and have the whole thing reviewed by 4 people.
LikeLike
November 24, 2014 at 12:07 pm
This is what we’re moving towards in Canada’s biomedical research grants. In the current inaugural Foundation grant competition, most face-to-face committee meetings are done away with. Instead, applications are initially reviewed by five reviewers, each of which may have an entirely different set of applications to review. Each reviewer ranks all the applications they have, and an application gets a final score which is the average of the percentile rank it got from each of the reviewers. By forcing reviewers to rank their grants instead of using absolute scores, this will allegedly alleviate score compression and calibration issues.
LikeLike
November 24, 2014 at 12:15 pm
This whole issue could be solved by adopting the long-standing NSF practice of ranking the grants after all the grants have been reviewed. It is a fiction that even standing study section members can precisely calibrate their scores across a single study section meeting or over multiple study sections. Ranking grants after all grants have been ranked and discussed places all scores in the context of all the science that has been discussed and helps to correct for both inexperience in scoring as well as scoring bias related to time of day, order and how hungry or tired the SS members were when they originally scored the grant. The ranking process should ideally involve substantial discussion that can be highly productive. It is also quite useful in situations when a small cadre from a particular field artificially inflate the scores of their favored application in order to get it through. By being forced to defend the scores as compared with other worthy applications from other fields, there can at least be an exchange that may prompt other study section members to revise their scores.
The ranking process does not take into account possible fluctuations in overall science quality from one study section to another. However, this is the point that previously scored applications could be revisited and used by the study section members to “normalize” their scores.
While the above is not a perfect solution, it addresses some of the problems in the current system and gets away from the fiction that humans are somehow able to adhere to a consistent internal scale that they are able to apply fairly to every application they review.
LikeLike
November 24, 2014 at 1:15 pm
You should have seen the look of horror on my face at dinner following my first study section when some guy said “I thought I was giving it a really good a
score…you guys are telling me that wasn’t fundable?”
Ahh, back in the days of 1.0-5.0, using only the 1.0-1.3 range to score fundable apps. Now it’s the opposite n00b problem (on study sections that are properly spreading their scores): “But I thought I was giving it a really bad score..you guys are telling me that was fundable?”
LikeLike
November 24, 2014 at 2:03 pm
ERRoRZ of calibration suck.
LikeLike
November 24, 2014 at 4:27 pm
I wouldn’t like a ranking system. To me that will introduce a lot of bias- certainly it is the easiest place to put in all the preconceptions, and doesn’t really control for good crop of grants vs. bad crop (which percentiling should), and since panels don’t read every single grant. I think NIH review is vastly superior to NSF review. I would propose the current system but try to empower SROs to try and keep review quality up and promote a culture where solely StockCritique™ reviews were considered unacceptable.
LikeLike
November 24, 2014 at 5:25 pm
I think if a grant is clearly enough written, then an uber expert in a micro sub sub specialty should not be required to evaluate it. If that is the case, then it probably used too much jargon, right? If you write about super super specificities and little known facts, you run the risk of scientific disagreement between yourself and your reviewer. If a fact is little known, it may as well be little believed, and you’re not putting yourself on a solid foundation here. Just my two cents.
LikeLike
November 24, 2014 at 7:52 pm
What about appreciation for the sub sub specialty, E rook? Is that important? Otherwise we’d all be working on cancer, no?
LikeLike
November 24, 2014 at 9:08 pm
Reviewers who are experts in your particular field would likely either be your collaborators or competitors, neither of whom should be reviewing your grants. IMO it’s better to have reviewers who are somewhat familiar with your field in a general sense (ie those interested in bunny hopping) and are willing to consider topics outside their domain worthy of funding that they haven’t considered before (ie. bunny lovin’). The grant should be a tool to educate the semi-expert reviewer why this topic is important (without bunny lovin’ there wouldn’t be any more bunnies to hop), and should include enough detail to convince them that they should support the project without overburdening them with details.
LikeLike
November 24, 2014 at 9:22 pm
The SEPs are a pretty big disincentive to serving as an appointed member of a study section….reviews are usually done as a blogroll which makes it hard to call out goofy scoring, and the percentiles are whacked.,,,,nobody knows the payline.
LikeLike
November 24, 2014 at 9:52 pm
I think one is capable of appreciating a subspecialty without being a player in it. I suppose what you’re possibly getting at is a Bunny Hopping expert distinguishing between two different approaches to solve a problem about Bunny Lovin. In this case, it probably goes back to track record of the applicants which disadvantages the noobs or smaller univs; in which case, Applicant should nichify more. Or maybe reviewer can engage critical thinking skills to identify potential problems, or maybe talk about the two approaches with a colleague in a general sense. Throw the ResPlans at a grad student and discuss later (my advisor did that a lot). Probably asking too much.
LikeLike
November 24, 2014 at 11:44 pm
Uh, just as a PSA, E rook, study section members are not allowed to share applications with others, even for “benign” reasons such as obtaining feedback from someone familiar with a technique or method.
Please do not try this at home.
LikeLike
November 24, 2014 at 11:50 pm
Yeah but out of those 10 only 3 are as primary. I always try to make sure I’m good on those, to the point of refusing to be primary if I’m not 100% comfortable with the material. Our SRO is a lot more forgiving if you say “can you move me to secondary on this one”, versus “I can’t review this at all”. I’m guessing if other folks think similarly (i.e., it’s more important to be an expert on your primary proposals versus the others) then this explains the classical “reviewer 3 doesn’t know shit” phenomenon!
LikeLike
November 25, 2014 at 10:55 am
Gtk
LikeLike
November 25, 2014 at 11:21 am
The SEPs are a pretty big disincentive to serving as an appointed member of a study section….reviews are usually done as a blogroll which makes it hard to call out goofy scoring, and the percentiles are whacked.,,,,nobody knows the payline.
This is completely backwards. SEPs are fantastic, because they are percentiled against the all-CSR base, and thus are under zero pressure to spread scores.
LikeLike
November 25, 2014 at 2:18 pm
1. When you’re on a SS, is there a timetable for when you can be asked to be moved to secondary? I know some PIs who don’t get around to reading until the last minute … by then, one would think, it’s too late to ask for a reshuffle.
2. What are the ethics of showing research plans to trainees for discussion / as a learning exercise for them (and a more interesting way of engaging the material for yourself)? My advisor did this so frequently that I thought it was common (maybe even expected).
LikeLike
November 25, 2014 at 3:30 pm
You are absolutely and explicitly forbidden from disclosing the contents of grant applications to anybody. The SRO assigns, not the reviewer.
LikeLike
November 26, 2014 at 9:02 am
I am appalled that a reviewer is showing grants to anybody.
LikeLike
December 1, 2014 at 7:58 pm
My pretty much unqualified opinion on this is that the ad hoc reviewers are likely to be more useful as a ward against flagrant BS than for substantive qualification of specialist grants. In other words, the threat of maybe having someone who knows the references you cite as well as you do is probably highly useful in dissuading me from submitting over-hyped and under-referenced applications. This is especially true for the standing review committees, since the applicant knows who will be on the committee and might otherwise likely try to exploit their relative ignorance. That being said, if you don’t review a lot of grants, your ranking system isn’t going to be well worked out, and you’ll probably be so internally inconsistent that the committee should really take your score with a huge grain of salt. However, if something really good or really bad comes along, its easy enough for the reviewer to convey that info outside of a plain score, correct?
Maybe give the ad hoc people a different scale, so it can be weighted into the committee’s decision without being mathematically integrated.
LikeLike
December 1, 2014 at 8:10 pm
Calibration does happen in the course of discussion. Just…..unevenly.
LikeLike
December 1, 2014 at 8:16 pm
Off topic, but I’m really curious to get a firsthand look at how these meetings work (or don’t). They seem kind of insane.
LikeLike
December 2, 2014 at 11:07 am
You have to be invited to serve on study section. Do you know about the early career reviewers program?
http://public.csr.nih.gov/ReviewerResources/BecomeAReviewer/ECR/Pages/default.aspx
LikeLike
December 2, 2014 at 11:07 am
Why “insane”?
LikeLike
December 3, 2014 at 7:57 pm
I have seen that program. I’m still a postdoc, though.
Insane in terms of the volume of grants covered in a short period of time. Also the process seems to generate a lot of possibility for miscommunication, egoism, and playing favorites. I’m curious to see how well people restrain themselves in indulging in those.
LikeLike
December 3, 2014 at 9:39 pm
Insane in terms of the volume of grants covered in a short period of time.
Do remember that the three(ish) people who have primary responsibility for reviewing each grant have weeks to do their reading, considering and writing of the commentary. The other members of the panel have access and can read the grants if they want to (so long as they are not in conflict with a particular app). A week before the meeting, everyone is supposed to upload their preliminary critiques and initial scores- and then everyone on the panel can read those as well. So while the discussion takes place in a short period of time, ideally there has been more than sufficient amount of time for* each person to consider each and every application in full.
*not saying they do so, just that it is possible.
LikeLike