from a self described newProf at doc becca’s digs.

Last week, the first NIH proposal I wrote with PI status was rejected… I knew things were bad, but it still hurts…Problem is, I don’t know how to allocate my time between generating more preliminary data/pubs and applying for more grants. How many grants does the typical NIH- and/or NSF-funded (or wannabe-funded) TT prof write per year before getting funded?

It is not about what anyone else or the “typical” person has done.

It is about doing whatever you possibly can do until that Notice of Grant Award arrives.

My stock advice right now is that you need to have at least one proposal going in to the NIH for each standard receipt date. If you aren’t hitting it at least that hard, before you have a major award, you aren’t trying. If you think you can’t get out one per round…. you don’t really understand your job yet. Your job is to propose studies until someone decides to give your lab some support.

My other stock advice is take a look at the payline and assume those odds apply to you. Yes, special snoflake, you.

If the payline is 10%, then you need to expect that you will have to submit at least 10 apps to have a fighting chance. Apply the noob-discount and you are probably better off hitting twice that number. It is no guarantee and sure, the PI just down the hall struck it lucky with her first Asst Prof submission to the NIH. But these are the kinds of numbers you need to start with.

Once you get rolling, one new grant and one revised grant per round should be doable. They are a month apart and a revision should be way easier. After the first few, you can start taking advantage of cutting and pasting a lot of the grant text together to get a start on the next one.

Stop whining about preliminary data. Base it on feasibility and work from there. Most figures support at least a half dozen distinct grant applications. Maybe more.

I never know for sure how hard my colleagues are working when it comes to grant submissions. I know what I do…and it is a lot. I know what a subset of my other colleagues do and let me tell you, success is better correlated with effort (grants out the door) than it is with career rank. That has an effect, sure, but I know relatively older investigators who struggle to maintain stable funding and ones who enjoy multi-grant stability. They are distinguished to some extent by how many apps they get out the door. Same thing for junior colleagues. They are trying to launch their programs and all. I get this. They have to do a lot of setup, training and even spend time at the bench. But they also tend to have a very wait-and-see approach to grants. Put one in. Wait for the result. Sigh. “Well maybe I’ll resubmit it next round”. Don’t do this, my noob friends. Turn that app around for the next possible date for submission.

You’ll have another app to write for the following round, silly.

Failure to Replicate

March 20, 2013

I should have put that in quotes because it actually appears in the title of this new paper published in Neuropsychopharmacology:

Hart AB, de Wit H, Palmer AA. Candidate gene studies of a promising intermediate phenotype: failure to replicate. Neuropsychopharmacology. 2013 Apr;38(5):802-16. doi: 10.1038/npp.2012.245. Epub 2012 Dec 3. [PubMed]

ResearchBlogging.orgfrom the Abstract alone you can get a sense

We previously conducted a series of 12 candidate gene analyses of acute subjective and physiological responses to amphetamine in 99-162 healthy human volunteers (ADORA2A, SLC6A3, BDNF, SLC6A4, CSNK1E, SLC6A2, DRD2, FAAH, COMT, OPRM1). Here, we report our attempt to replicate these findings in over 200 additional participants ascertained using identical methodology. We were unable to replicate any of our previous findings.

The team, with de Wit’s lab expert on the human phenotyping and drug-response side and Palmer’s lab expert on the genetics, has been after genetic differences that mediate differential response to amphetamine for some time. There’s a human end and a mouse end to the overall program which has been fairly prolific.

In terms of human results, they have previously reported effects as varied as:
-association of an adenosine receptor gene polymorphism with degree of anxiety in response to amphetamine
-association of a dopamine transporter gene promotor polymorphism with feeling the drug effect and diastolic blood pressure
-association of casein-kinase I epsilon gene polymophisms with feeling the drug effect
-association with fatty acid amide hydrolase (FAAH) with Arousal and Fatigue responses to amphetamine
-association of mu 1 opioid receptor gene polymorphisms with Amphetamine scale subjective report in response to amphetamine

There were a dozen in total and for the most part the replication attempt with a new group of subjects failed to confirm the prior observation. The Discussion is almost plaintive at the start:

This study is striking because we were attempting to replicate apparently robust findings related to well-studied candidate genes. We used a relatively large number of new participants for the replication, and their data were collected and analyzed using identical procedures. Thus, our study did not suffer from the heterogeneity in phenotyping procedures implicated in previous failures to replicate other candidate gene studies (Ho et al, 2010; Mathieson et al, 2012). The failure of our associations to replicate suggests that most or all of our original results were false positives.

The authors then go on to discuss a number of obvious issues that may have led to the prior “false positives”.

-variation in the ethnic makeup of various samples- one reanalysis using ancestry as covariate didn’t change their prior results.

-power in Genome-Wide association studies is low because effect sizes / contribution to variance by rare alleles is small. they point out that candidate gene studies continue to report large effect sizes that are probably very unlikely in the broad scheme of things…and therefore comparatively likely to be false positives.

-multiple comparisons. They point out that not even all of their prior papers applied multiple comparisons corrections against the inflation of alpha (the false positive rate, in essence) and certainly they did no such thing for the 12 findings that were reported in a number of independent publications. As they note, the adjusted p value for the “322 primary tests performed in this study” (i.e., the same number included in the several papers which they were trying to replicate) would be 0.00015.

-publication bias. This discussion covers the usual (ignoring all the negative outcomes) but the interesting thing is the confession on something many of us (yes me) do that isn’t really addressed in the formal correction procedures for multiple comparisons.

Similarly, we sometimes considered several alternative methods for calculating phenotypes (eg, peak change score summarization vs area under the curve, which tend to be highly but incompletely correlated). It seems very likely that the candidate gene literature frequently reflects this sort of publication bias, which represents a special case of uncorrected multiple testing.

This is a fascinating read. The authors make no bones about the fact that they’ve found that no less than 12 papers that they have published were the result of false positives. Not wrong…not fraudulent. Let us be clear. We must assume they were published with peer review, analysis techniques and samples sizes that were (and are?) standard for the field.

But they are not true.

The authors offer up solutions of larger sample sizes, better corrections for multiple comparisons and a need for replication. Of these, the last one seems the best and most likely solution. Like it or not, research funding is limited and there will always be a sliding scale. At first we have pilot experiments or even anecdotal observations to put us on the track. We do one study, limited by the available resources. Positive outcomes justify throwing more resources at the question. Interesting findings can stimulate other labs to join the party. Over time, the essential features of the original observation or finding are either confirmed or consigned to the bin of “likely false alarm”.

This is how science progresses. So while we can use experiences like this to define what is a target sample size and scope for a real experiment, I’m not sure that we can ever overcome the problems of publication bias and cherry picking results from amongst multiple analyses of a dataset. At first, anyway. The way to overcome it is for the lab or field to hold a result in mind as tentatively true and then proceed to replicate it in different ways.

__
UPDATE: I originally forgot to put in my standard disclaimer that I’m professionally acquainted with one or more of the authors of this work.

Hart, A., de Wit, H., & Palmer, A. (2012). Candidate Gene Studies of a Promising Intermediate Phenotype: Failure to Replicate Neuropsychopharmacology, 38 (5), 802-816 DOI: 10.1038/npp.2012.245