Nakamura is quoted in a recent bit in Science by Jeffrey Brainard.

I’ll get back to this later but for now consider it an open thread on your experiences. (Please leave off the specific naming unless the event got published somewhere.)

I have twice had other PIs tell me they reviewed my grant. I did not take it as any sort of quid pro quo beyond *maybe* a sort of “I wasn’t the dick reviewer” sort of thing. In both cases I barely acknowledged and tried to move along. These were both scientists that I like both professionally and personally so I assume I already have some pro-them bias. Obviously the fact these people occurred on the review roster, and that they have certain expertise, made them top suspects in my mind anyway.

Updated:

“We hope that in the next few months we will have several cases” of violations that can be shared publicly, Nakamura told ScienceInsider. He said these cases are “rare, but it is very important that we make it even more rare.”

Naturally we wish to know how “rare” and what severity of violation he means.

Nakamura said. “There was an attempt to influence the outcome of the review,” he said. The effect on the outcome “was sufficiently ambiguous that we felt it was necessary to redo the reviews.”

Hmmm. “Ambiguous”. I mean, if there is ever *any* contact from an applicant PI to a reviewer on the relevant panel it could be viewed as an attempt to influence outcome. Even an invitation to give a seminar or invitation to join a symposium panel proposal could be viewed as currying favor. Since one never knows how an implicit or explicit bias is formed, how would it ever be anything other than ambiguous? But if this is something clearly actionable by the NIH doesn’t it imply some harder evidence? A clearer quid pro quo?

Nakamura also described the types of violations of confidentiality NIH has detected. They included “reciprocal favors,” he said, using a term that is generally understood to mean a favor offered by a grant applicant to a reviewer in exchange for a favorable evaluation of their proposal.

I have definitely heard a few third hand reports of this in the past. Backed up by a forwarded email* in at least one case. Wonder if it was one of these type of cases?

Applicants also learned the “initial scores” they received on a proposal, Nakamura said, and the names of the reviewers who had been assigned to their proposal before a review meeting took place.

I can imagine this happening** and it is so obviously wrong, even if it doesn’t directly influence the outcome for that given grant. I can, however, see the latter rationale being used as self-excuse. Don’t.

Nakamura said. “In the past year there has been an internal decision to pursue more cases and publicize them more.” He would not say what triggered the increased oversight, nor when NIH might release more details.

This is almost, but not quite, an admission that NIH is vaguely aware of a ground current of violations of the confidentiality of review. And that they also are aware that they have not pursued such cases as deeply as they should. So if any of you have ever notified an SRO of a violation and seen no apparent result, perhaps you should be heartened.

oh and one last thing:

In one case, Nakamura said, a scientific review officer—an NIH staff member who helps run a review panel—inappropriately changed the score that peer reviewers had given a proposal.

SROs and Program Officers may also have dirt on their hands. Terrifying prospect for any applicant. And I rush to say that I have always seen both SROs and POs that I have dealt with directly to be upstanding people trying to do their best to ensure fair treatment of grant applications. I may disagree with their approaches and priorities now and again but I’ve never had reason to suspect real venality. However. Let us not be too naive, eh?

_
*anyone bold enough to put this in email….well I would suspect this is chronic behavior from this person?

**we all want to bench race the process and demystify it for our friends. I can see many entirely well-intentioned reasons someone would want to tell their friend about the score ranges. Maybe even a sentiment that someone should be warned to request certain reviewers be excluded from reviewing their proposals in the future. But….. no. No, no, no. Do not do this.

PI seeks postdoc

March 23, 2018

Every PI wants only the most brilliant, creative and motivated trainees that will put in insane levels of effort to advance the lab agenda.

We know this because it is how they write their postdoc solicitation blurbs.

This is not what is consistently available.

I know this because a consistent backchannel theme of my dubious life online as science careers nerd features PIs complaining about their trainees.

My usual response is to point out that they became PI due to being much better than average. So of course most of their trainees aren’t going to be as good as they are*.

__

*were

Delay, delay, delay

March 20, 2018

I’m not in favor of policies that extend the training intervals. Pub requirements for grad students is a prime example. The “need” to do two 3-5 year postdocs to be competitive. These are mostly problems made by the Professortariat directly.

But NIH has slipped into this game. Postdocs “have” to get evidence of funding, with F32 NRSAs and above all else the K99 featuring as top plums.

Unsurprisingly the competition has become fierce for these awards. And as with R-mechs this turns into the traffic pattern queue of revision rounds. Eighteen months from first submission to award if you are lucky.

Then we have the occasional NIH Institute which adds additional delaying tactics. “Well, we might fund your training award next round, kid. Give it another six months of fingernail biting.

We had a recent case on the twttrs where a hugely promising young researcher gave up on this waiting game, took a job in home country only to get notice that the K99 would fund. Too late! We (MAGA) lost them.

I want NIH to adopt a “one and done” policy for all training mechanisms. If you get out-competed for one, move along to the next stage.

This will decrease the inhumane waiting game. It will hopefully open up other opportunities (transition to quasi-faculty positions that allow R-mech or foundation applications) faster. And overall speed progress through the stages, yes even to the realization that an alternate path is the right path.

Survey says:

“Yes”.

I am heartened by this although the sizeable minority answering no is a curiosity. I wonder if it is mostly an effect of career stage? Maybe some undergrads answering?

You may see more dead horse flogging than usual folks. Commentariat is not as vigorous as I might like yet.

This emphasizes something I had to say about the Pier monstrosity purporting to study the reliability of NIH grant review.
Terry McGlynnsays:
https://twitter.com/hormiga/status/973645583796744192

Absolutely. We do not want 100% fidelity the evaluation of grant “merit”. If we did that, and review was approximately statistically representative of the funded population, we would all end up working on cancer in the end.

Instead, we have 28 I or Cs. These are broken into Divisions that have fairly distinct missions. There are Branches within the Divisions and multiple POs who may have differing viewpoints. CSR fields a plethora of study sections, many of which have partially overlapping missions. Meaning a grant could be reviewed in one of several different sections. A standing section might easily have 20-30 reviewers per meeting and you grant might reasonably be assigned to several different permutations of three for primary assessment. Add to this the fact that reviewers change over time within a study section, even across rounds to which you are submitting approximately the same proposal. There should be no wonder whatsoever that review outcome for a given grant might vary a bit under differing review panels.

Do you really want perfect fidelity?

Do you really want that 50% triage and another 30-40% scored-outside-the-payline to be your unchangeable fate?

Of course not.

You want the variability in NIH Grant review to work in your favor.

If a set of reviewers finds your proposal unmeritorious do you give up* and start a whole ‘nother research program? Eventually to quit your job and do something else when you don’t get funded after the first 5 or 10 tries?

Of course not. You conclude that the variability in the system went against you this time, and come back for another try. Hoping that the variability in the system swings your way.

Anyway, I’d like to see more chit chat on the implicit question from the last post.

No “agreement”. “Subjectivity”. Well of course not. We expect there to be variation in the subjective evaluation of grants. Oh yes, “subjective”. Anyone that pretends this process is “objective” is an idiot. Underinformed. Willfully in denial. Review by human is a “subjective” process by its very definition. That is what it means.

The only debate here is how much variability we expect there to be. How much precision do we expect in the process.

Well? How much reliability in the system do you want, Dear Reader?

__
*ok, maybe sometimes. but always?

I was critical of a recent study purporting to show that NIH grant review is totally random because of structural flaws that could not have been designed more precisely to reach a foregone conclusion.

I am also critical of CSR/NIH self-studies. These are harder to track because they are not always published or well promoted. We often only get wind of them when people we know are invited to participate as reviewers. Often the results are not returned to the participants or are returned with an explicit swearing to secrecy.

I’ve done a couple of these self-study reviews for CSR.

I am not impressed by their designs either. Believe me.

As far as I’ve heard or experienced, most (all) of these CSR studies have the same honking flaw of restricted range. Funded applications only.

Along with other obscure design choices that seem to miss the main point*. One review pitted apps funded from closely-related sections against each other. ….except “closely-related” did not appear that close to me. It was more a test of whatever historical accident made CSR cluster those study sections or perhaps a test of mission drift. A better way would have been to cluster study sections to which the same PI submits. Or by assigned PO maybe? By a better key word cluster analysis?

Anyway, the CSR designs are usually weird when I hear about them. They never want to convene multiple panels of very similar reviewers to review the exact same pile of apps in real time. Reporting on their self-studies is spotty at best.

This appears to my eye to be an attempt to service a single political goal. I.e. “Maintain the ability to pretend to Congress that grant review selects only the most meritorious applications for funding with perfect fidelity”.

Th critics, as we’ve seen, do the opposite. Their designs are manipulated to provide a high probability of showing NIH grant review is utterly unreliable and needs to be dismantled and replaced.

Maybe the truth lies somewhere in the middle? And if these forces would combine to perform some better research we could perhaps better trust jointly proposed solutions.

__

*I include the “productivity” data mining. NIH also pulls some sketchy stuff with these studies. Juking it carefully to support their a priori plans, rather than doing the study first and changing policy after.

Pier and colleagues published a study purporting to evaluate the reliability of NIH style peer review of grant applications. Related work that appears to be from the same study was published by this group in 2017.

From the supplement to the 2018 paper, we note that the reviewer demographics were 62% Asian, 38% white with zero black or hispanic reviewers. I don’t know how that matches the panels that handle NCI applications but I would expect some minimal black/hispanic representation and a lot lower Asian representation to match my review panel experiences. The panels were also 24% female which seems to match with my memory of NIH stats for review running under 1/3 women.

There were 17% of reviewers at assistant professor rank. This is definitely a divergence from CSR practice. The only data I saw right around the time of Scarpa’s great Purge of Assistant Professors suggested a peak of 10% of reviewers. Given the way ad hoc / empaneled reviewer loads work, I think we can conclude that way fewer than 10% of reviews were coming from Assistant Professors. As you know, we are now a decade past the start of the purge and these numbers have to be lower. So the panel demographics are not similar.

N.b., The 2017 papers says they surveyed the reviewers on similarity to genuine NIH review experience but I can’t find anywhere it states the amount of review experience for the subjects. Similarly, while they all had to have been awarded at least one R01, we don’t know anything about their experiences as applicants. Might be relevant. A missed opportunity would seem to be the opportunity to test reviewer demographics in the 2017 paper which covers more about the process of review, calibration of scoring, agreement after discussion, etc.

The paper(s) also says that they tried to de-identify the applicants.

All applications were deidentified, meaning the names of the PIs, any co-investigators, and any other research personnel were replaced with pseudonyms. We selected pseudonyms using public databases of names that preserved the original gender, nationality, and relative frequency across national populations of the original names. All identifying information, including institutional addresses, email addresses, phone numbers, and hand-written signatures were similarly anonymized and re-identified as well.

I am still looking but I cannot find any reference to any attempt of the authors to validate whether the blinding worked. Which is in and of itself a fascinating question. But for the purposes of the “replication” of NIH peer review we must recognize that Investigator and Environment are two of five formally co-equal scoring criteria. We know that the NIH data show poor correlation of Investigator and Environment criterion scores with overall voted impact score (Approach and Significance are the better predictors), but these are still scoring criteria. How can this study attempt to delete two of these and then purport to be replicating the process? It is like they intentionally set out to throw noise into the system.

I don’t think the review panels triaged any of the 25 proposals. The vast majority of NIH review involves triage of the bottom ~half of the assigned proposals. Reviewers know this when they are doing their preliminary reading and scoring.

Pier and colleagues recently published a study purporting to address the reliabiliy of the NIH peer review process. From the summary:

We replicated the NIH peer-review process to examine the qualitative and quantitative judgments of different reviewers examining the same grant application. We found no agreement among reviewers in evaluating the same application. These findings highlight the subjectivity in reviewers’ evaluations of grant applications and underscore the difficulty in comparing the evaluations of different applications from different reviewers—which is how peer review actually unfolds.

emphasis added.

This thing is a crock and yet it has been bandied about on the Twitts as if it is the most awesome thing ever. “Aha!” cry the disgruntled applicants, “This proves that NIH peer review is horrible, terrible, no good, very bad and needs to be torn down entirely. Oh, and it also proves that it is a super criminal crime that some of my applications have gone unfunded, wah.

A smaller set of voices expressed perplexed confusion. “Weird“, we say, “but probably our greatest impression from serving on panels is that there is great agreement of review, when you consider the process as a whole.

So, why is the study irretrievably flawed? In broad strokes it is quite simple.
Restriction of the range. Take a look at the first figure. Does it show any correlation of scores? Any fair view would say no. Aha! Whatever is being represented on the x-axis about these points does not predict anything about what is being represented on the y-axis.

This is the mistake being made by Pier and colleagues. They have constructed four peer-review panels and had them review the same population of 25 grants. The trick is that of these 16 were already funded by the NCI and the remaining 9 were prior unfunded versions of grants that were funded by the NCI.

In short, the study selects proposals from a very limited range of the applications being reviewed by the NIH. This figure shows the rest of the data from the above example. When you look at it like this, any fair eye concludes that whatever is being represented by the x value about these points predicts something about the y value. Anyone with the barest of understanding of distributions and correlations gets this. Anyone with the most basic understanding grasps that a distribution does not have to have perfect correspondence for there to be a predictive relationship between two variables.

So. The authors claims are bogus. Ridiculously so. They did not “replicate” the peer review because they did not include a full range of scores/outcomes but instead picked the narrowest slice of the funded awards. I don’t have time to dig up historical data but the current funding plan for NCI calls for a 10%ile payline. You can amuse yourself with the NIH success rate data here, the very first spreadsheet I clicked on gave a success rate of 12.5% for NCI R01s.

No “agreement”. “Subjectivity”. Well of course not. We expect there to be variation in the subjective evaluation of grants. Oh yes, “subjective”. Anyone that pretends this process is “objective” is an idiot. Underinformed. Willfully in denial. Review by human is a “subjective” process by its very definition. That is what it means.

The only debate here is how much variability we expect there to be. How much precision do we expect in the process.

The most fervent defenders of the general reliability of the NIH grant peer review process almost invariably will acknowledge that the precision of the system is not high. That the “top-[insert favored value of 2-3 times the current paylines]” scoring grants are all worthy of funding and have very little objective space between them.

Yet we still seem to see this disgruntled applicant phenotype, responding with raucous applause to a crock of crap conclusion like that of Pier and colleagues, that seem to feel that somehow it is possible to have a grant evaluation system that is perfect. That returns the exact same score for a given proposal each and every time*. I just don’t understand these people.
__
Elizabeth L. Pier, Markus Brauer, Amarette Filut, Anna Kaatz, Joshua Raclaw, Mitchell J. Nathan, Cecilia E. Ford and Molly Carnes, Low agreement among reviewers evaluating the same NIH grant applications. 2018, PNAS: published ahead of print March 5, 2018, https://doi.org/10.1073/pnas.1714379115

*And we’re not even getting into the fact that science moves forward and that what is cool today is not necessarily anywhere near as cool tomorrow

Our good blog friend @McLNeuro is running a Donor’s Choose fundraising drive for March Madness. The idea is that you donate $10 or more to a Donor’s Choose project and then register a bracket under Darwin’s Balls (use that link) at CBS Sports.

As many of you know, I’m a big fan of the Donor’s Choose charity. My Readers have generously funded many, many projects in classrooms all across the nation in response to my occasional pleas.

Online drives can work collaboratively with a blog host picking a few interesting projects to fund, that way we online science folks can take credit for pushing them to funding. So keep checking with the EdgeForScholars page if you want to go that direction.

But if the ones on that page don’t grab your attention, just click on the main Donor’s Choose page and find something that tickles your fancy. There are ways to screen projects by location (if you want to help your town, or home town), topic domain (sports? Science? Literacy? …..food*)

__
*Oh glory. Teachers have to beg for food to keep hungry kids on task now. I just can’t even, America. We are supposed to be better than this.

almost tenured PI raised some interesting questions in a comment:

So you want me to submit a preprint so I can get comments that I have to spend time responding to? No thanks. I spend enough time responding to the comments of reviewers from real journals. I can’t imagine how much time I’d have to spend responding to comments that are public and immortalized online. No paper is perfect. How many comments does a paper need before it’s acceptable for publication? Where does it end? I do not need more work when it already feels like the bar to publishing a paper keeps rising and rising.

Ok, so first off we need to recognize that the null hypothesis right now has to be that there will not be extensive substantial commentary on the preprints. PubMed Central shut down its commenting scheme for lack of use. Journals have been trying out various systems for years (a decade or more?) without general success. The fear that each posted pre-print will attract a host of journal-club style comments is probably not well supported.

But lets suppose your preprint does get some critical comment*. Are you obliged to respond?

This ties into the uncertainty and disagreement over when to submit the preprint. At what stage of the process do you post it? Looking at the offerings on bioRxiv, I think many folks are very close to my position. Namely, we are waiting to submit a preprint until it is ready to go out for peer review at a journal.

So any comments it gets are being made in parallel with review and I can choose to address or not when I get the original decision back and need to revise the manuscript. Would the comments then somehow contaminate the primary review? Would the reviewers of a revision see the comments on the pre-print and demand you address those as well as the regular peer comments? Would an Editor? For now I seriously doubt this is a problem.

So, while I think there may be many reasons for people not to want to post their manuscripts as pre-prints, I don’t think the fear that this will be an extra dose of Reviewer #3 is well supported.

__
*I had one get a comment and we ended up including something in a revision to address it so, win-win.

When you start signing your reviews, you are confessing that you think you have reached the point where your reputation is more persuasive than the quality of your ideas.