Datahound on productivity

October 23, 2014

This final figure from Datahound’s post on K99/R00 recipients who have managed to win R01 funding is fascinating to me. This is a plot of individual investigators, matching their number of published papers against a weighted sum of publication. The weighting is for the number of authors on each paper as follows: “One way to correct for the influence of an increased number of authors on the number of publications is to weight each publication by 1/(number of authors) (as was suggested by a comment on Twitter). In this scenario, a paper with two authors would be worth 1/2 while a paper with 10 authors would be worth 1/10.

Doing this adjustment to the unadjusted authors/papers relationship tightens up the relationship from a correlation coefficient of 0.47 to 0.83.

Ultimately this post shows pretty emphatically that when you operate in a subfield or niche or laboratory that tends to publish papers with a lot of authors, you get more author credits. This even survives the diluting effect of dividing each paper by the number of authors on it. There are undoubtedly many implications.

I think the relationship tends to argue that increasing the author number is not a reflection of the so-called courtesy or guest authorships that seem to bother a lot of a people in science. If you get more papers produced, even when you divide by the number of authors on each paper, then this tends to suggest that authors are contributing additional science. The scatter plots even seem to show a fairly linear relationship so we can’t argue that it tails off after some arbitrary cutoff of author numbers.

Another implication is for the purely personal. If we can generate more plots like this one across subfields or across PI characteristics (there may be something odd about the K99/R00 population of investigators for example), there may be a productivity line against which to compare ourselves. Do we (or the candidate) have more or fewer publications than would be predicted from the average number of authors? Does this suggest that you can identify slackers from larger labs (that happen to have a lot of pubs) and hard chargers from smaller labs (that have fewer total pubs, but excel against the expected value)?