Nature Neuroscience Compares Citations with Article Downloads

May 28, 2008

An editorial in Nature Neuroscience [h/t: writedit] describes an in-house study they undertook to compare

citations to individual articles and reviews in Nature Neuroscience (February-December, 2005) with download statistics from our website. Downloads represented the total PDF page views for any particular manuscript within the first 90 days of being posted online (including Advanced Online Publication (AOP) time).

Interesting. I’ve been pondering the potential value of article download stats for some time now so I’m intrigued by any investigation into such metrics. Perhaps this will be the start of a trend. (I will warn you in advance, however, not to expect an actual study as such out of this narrowly constrained slice of data.)


Noah Gray has more on his blog entry at Action Potential.

Everyone has their own pet problem with impact factors, whether it be with the calculation method, the non-reproducibility of the actual values, or the disagreement over what IFs really represent, just to name a few. Despite all of these concerns (and more), these numbers are typically used to rate the importance or prominence of a particular journal, and thus by proxy, the importance of the individual papers published within. This is a seriously flawed use of association (see a previous Nature Neuroscience editorial discussing this concept), leading scientists to often equate the total number of citations with scientific impact, which can be fraught with problems.

Indeed. He’s singing my song here. Still, it IS fascinating. The stable of Nature journals can be reliably found to issue such breastbeating analyses while at the same time being active beneficiaries of and contributors to, this “flawed use”. In the case of the flagship journal…well, let’s just say it takes me awhile to stop laughing when I read these sorts of comments from people within the Nature umbrella. And even Noah, when pushed admits that rather than take a highly-downloaded paper in a low-Impact Factor journal:

So yes, I’d still shoot for the paper in the high impact journal and take my chances…

Moving along, I see that the editorial suggests a completely different motivation for this study, namely the propagation of incorrect citations.

One striking example is a study which suggested, on the basis of the propagation of citation errors, that about 80% of all references are transcribed from other reference lists rather than from the original source article. Such reports lead to the suspicion that most authors do not read the papers they cite, and that the papers that are the most cited are not necessarily the papers that are the most read. If true, this makes citation counting far less significant and calls into question the accuracy of referencing in the literature. Moreover, a practice of ‘abstract citation’ in lieu of reading the full article or citing papers based on reference lists is particularly problematic

I dunno. I just don’t think their experimental design could really address this question. They seem to be saying that the more people read the paper the more cites it gets. No duh. But wouldn’t the erroneous citations develop over a lengthy period of time? And be much less likely to occur with more recent work? Even the correlation they present between PDF downloads and citations disappears as the window for analysis goes out past 90 days post-publication.
Is this a musty old mouldering straw man they’ve erected? How big is this problem anyway? If erroneous citation is really rare, no correlation analysis is going to pick up anything useful, is it?
I was thinking about the ways in which my papers get cited incorrectly and the times that I notice erroneous citation of other work. Occasionally, I’ll run across just plain old incorrect citations where the cited article has nothing to do with the apparent point made in the citing article. This is, however, incredibly rare. And of next to zero impact because journal searching is so excellent and readily accessible. So a flat-out blown cite is very rare and a minor annoyance. Another minor annoyance is in the lines of a more-or-less appropriate citation that is unscholarly, for want of a better term. One that is not the first, best or most appropriate citation for the point being made. But…..this is in the eye of the beholder. Hard to call it an error, really. The thing that really chaps me about bad citing practices (or, more properly I suppose, a failure to actually read the stuff you are citing) is when some concept based on a very weak dataset gets enshrined into the literature as canonical. When it is really very weak support. The problem is that then when you try to publish a much more extensive followup, you are left arguing uphill with your better data against poorer, albeit earlier data. That’s annoying. Even worse is when grant reviewers don’t agree one should be funded to, in part, re-do the investigations better since “we already know that”. No. We. Don’t! (…whoops. de-rant, DM, de-rant.)
So getting back to the point, I’m just not getting a good feel for the problem of erroneous stats. Still, good ol’ Noah brings it home in the end:

We realize that this analysis is enticing at best, potentially providing a piece of an alternative solution for deciphering the impact of an individual paper. In this current scientific climate where tenure and grant funding decisions are influenced by flawed metrics like impact factor, it is important to make good use of all available technology in an attempt to realize a better system of measuring the scientific impact of any particular paper. This analysis is obviously preliminary and flawed in its own ways, but perhaps metrics such as paper downloads can find a place in a compilation of aggregated stats, painting a more accurate and informative picture of manuscript influence.

“Make good use of all available technology”. Yep. When grants and people’s careers are on the line, this works for me.

6 Responses to “Nature Neuroscience Compares Citations with Article Downloads”

  1. GrantSlave Says:

    Using download stats in place of Impacted Factor perhaps? Perish the thought.

    Like

  2. heh Says:

    i wonder if Nature knows i tend to download the same pdfs more than once. maybe they can put together something like itunes and we can pay 99 cents per download. 😛

    Like

  3. PhysioProf Says:

    Please excuse me while I run all over campus downloading my papers from every workstation I can find.

    Like

  4. Eric Lund Says:

    The thing that really chaps me about bad citing practices (or, more properly I suppose, a failure to actually read the stuff you are citing) is when some concept based on a very weak dataset gets enshrined into the literature as canonical. When it is really very weak support.
    This happens far too often, and not just in biomedical fields (I’m in geophysics myself). There is a famous example from Surely You’re Joking, Mr. Feynman!, in which data that were farther from theory than they should have been prompted Feynman to go back and re-read the original reference. He realized that he had read that paper as a grad student and remembered thinking to himself that it didn’t prove anything: the conclusion was based on a single data point at the extreme of the range tested. Nonetheless, that paper’s conclusion was enshrined as conventional wisdom for nearly two decades, before the discrepancy between theory and experiment became too big to ignore.

    Like

  5. DrugMonkey Says:

    Please excuse me while I run all over campus downloading my papers from every workstation I can find.
    While you are doing that to run up the blog’s pageviews you might as well download your papers….a twofer!

    Like

  6. neurolover Says:

    “The thing that really chaps me about bad citing practices (or, more properly I suppose, a failure to actually read the stuff you are citing) is when some concept based on a very weak dataset gets enshrined into the literature as canonical. When it is really very weak support. The problem is that then when you try to publish a much more extensive followup, you are left arguing uphill with your better data against poorer, albeit earlier data.”
    Oh my, that’s my song. I’ve had a few times where I’ve gotten excited about a cite — thinking “Wow, someone did that experiment” only to realize it’s the same **** cite. And, I got to believe that a bunch of people aren’t reading the original, but are just doing a second generation cite.
    bj

    Like


Leave a comment