Rockey explains the percentiling of tied NIH scores
June 24, 2013
Admittedly I hadn’t looked all that hard but I was previously uncertain as to how NIH grants with tied scores were percentiled. Since the percentiles are incredibly important* for funding decisions, this was a serious question after the new scoring approach (reviewers vote 1-9 integer values, the average is multiplied by 10 for final score. lower is better.) which was designed to generate more tied scores.
The new system poses the chance that a lot of “ranges” for the application are going to be 1-2 or 2-3 and, in some emerging experiences, a whole lot more applications where the three assigned reviewers agree on a single number. Now, if that is the case and nobody from the panel votes outside the range (which they do not frequently do), you are going to end up with a lot of tied 20 and 30 priority scores. That was the prediction anyway.
NIAID has data from one study section that verifies the prediction.
Percentiles range from 1 to 99 in whole numbers. Rounding is always up, e.g., 10.1 percentile becomes 11.
So you should be starting to see that the number of applications assigned to your percentile base** and the number that receive tying scores is going to occasionally throw some whopping discontinuities into the score-percentile relationship.
Rock Talk explains:
However, as you can see, this formula doesn’t work as is for applications with tied scores (see the highlighted cells above) so the tied application are all assigned their respective average percentile
In her example, the top applications in a 15 application pool scored impact scores of 10, 11, 19, 20, 20, 20…. This is a highly pertinent example, btw. Since reviewers concurring on a 2 overall impact is very common and represents a score that is potentially in the hunt for funding***.
In Rockey’s example, these tied applications block out the 23, 30 and 37 percentile ranks in this distribution of 15 possible scores. (The top score gets a 3%ile rank, btw. Although this is an absurdly small example of a base for calculation, you can see the effect of base size…10 is the best possible score and in an era of 6-9%ile paylines the rounding-up takes a bite.) The average is assigned so all three get 30%ile. Waaaaay out of the money for an application that has the reviewers concurring on the next-to-best score? Sheesh. In this example, the next-best-scoring application averaged a 19, only just barely below the three tied 20s and yet it got a 17%ile for comparison with their 30%ile.
You can just hear the inchoate screaming in the halls as people compare their scores and percentiles, can’t you?
Rockey lists the next score above the ties as a 28 but it could have just as easily been a 21. And it garners a 43%ile.
Again, cue screaming.
Heck, I’m getting a little screamy myself, just thinking about sections which are averse to throwing 1s for Overall Impact and yet send up a lot of 20 ties. Instead of putting all those tied apps in contention for consideration they are basically guaranteeing none of them get funded because they are all kicked up to their average percentile rank. I don’t assert that people are intentionally putting up a bunch of tied scores so that they will all be considered. But I do assert that there is a sort of mental or cultural block at going below (better than) a 2 and for many reviewers, when they vote a 2 they think this application should be funded.
In closing, I am currently breaking my will to live by trying to figure out the possible percentile base sizes that let X number of perfect scores (10s) receive 1%iles versus being rounded up to 2%iles and then what would be associated with the next-best few scores. NIAID has posted an 8%ile payline and rumours of NCI working at 5%ile or 6%ile for next year are rumbling. The percentile increments that are permitted, based on the size of the percentile base and their round-up policy, become acute.
*Rumor of a certain IC director who “goes by score” rather than percentile becomes a little more understandable with this example from Rock Talk. The swing of a 20 Overall Impact score from 10%ile to 30%ile is not necessarily reflective of a tough versus a softball study section. It may have been due to the accident of ties and the size of the percentile base.
**typically the grants in that study section round and the two prior rounds for that study section.
***IME, review panels have a reluctance to throw out impact scores of 1. The 2 represents a hesitation point for sure.