ISI has two sets of citation books

December 20, 2007

Coturnix has the call:

there was a study the other day, in the Journal of Cell Biology, that seriously calls in question the methodology used by Thompson Scientific to calculate the sacred Impact Factor

A little bait from the article:

Articles are designated as primary, review, or “front matter” by hand by Thomson Scientific employees examining journals using various bibliographic criteria, such as keywords and number of references … Some publishers negotiate with Thomson Scientific to change these designations in their favor. The specifics of these negotiations are not available to the public, but one can’t help but wonder what has occurred when a journal experiences a sudden jump in impact factor.

Dude, you had me at “some publishers negotiate…”!

Because the impact factor calculation is a mean, it can be badly skewed by a “blockbuster” paper. … In a self-analysis of their 2005 impact factor, Nature noted that 89% of their citations came from only 25% of the papers published. When we asked Thomson Scientific if they would consider providing a median calculation in addition to the mean they already publish, they replied, “It’s an interesting suggestion…The median… would typically be much lower than the mean. There are other statistical measures to describe the nature of the citation frequency distribution skewness, but the median is probably not the right choice.”

And here we have a common refrain when talking about inconvenient facts about the IF. “Well, of course we know it is flawed but we’re going to keep right on using it as if it is not!” Nature, I’m looking at you, hypocrites.

Thomson Scientific explained that they have two separate databases—one for their “Research Group” and one used for the published impact factors (the JCR). We had been sold the database from the “Research Group”, which has fewer citations in it because the data have been vetted for erroneous records. “The JCR staff matches citations to journal titles, whereas the Research Services Group matches citations to individual articles”, explained a Thomson Scientific representative. “Because some cited references are in error in terms of volume or page number, name of first author, and other data, these are missed by the Research Services Group.” When we requested the database used to calculate the published impact factors (i.e., including the erroneous records), Thomson Scientific sent us a second database. But these data still did not match the published impact factor data. This database appeared to have been assembled in an ad hoc manner to create a facsimile of the published data that might appease us. It did not.

It does not appease me and it should not appease you either, DearReader.

__

… and take Coturnix’s closing advice:

And several other bloggers seem to agree, including Bjoern Brembs, The Krafty Librarian, Eric Schnell, Peter Suber and Stevan Harnad who each dissect the paper in more detail than I do, so go and read their reactions.

34 Responses to “ISI has two sets of citation books”

  1. physioprof Says:

    Ruh, Roh.

    Like

  2. whimple Says:

    I can’t get excited about this stuff. If you think ISI is doing a crappy job, you know where pubmed is; go derive your own IF algorithm. Only the journals and their shills get in a snit over 3 decimal places of IF score granularity. As a general guide to classifying journals you’re not familiar with as “excellent”, “very good”, “ok” and “mostly sucks”, the ISI IF is good enough.

    Like

  3. drugmonkey Says:

    “Only the journals and their shills get in a snit over 3 decimal places of IF score granularity.”

    Would I be the “shill” then? At any rate, we are apparently talking about effects of somewhat larger magnitude. From the article “For example, Current Biology had an impact factor of 7.00 in 2002 and 11.91 in 2003.”.

    This is a significant difference anywhere IF is considered (other than “CNS or bust” land anyway). Whether IF affects you in particular has to do with your specific circumstances. I assert that this is increasingly a real-politik concern for academic careers.

    Like

  4. drugmonkey Says:

    “If you think ISI is doing a crappy job, you know where pubmed is; go derive your own IF algorithm.”

    And this is just plain missing the point which is that other people who believe in IF are doing the evaluating of people who have criticisms of it. The goal is to disabuse the TruIFBeliever of the conceit that IF means what they think it does.

    Replacement of IF with some other metric is a secondary consideration, IMO. My prior proposal of the d-index, for example, was an attempt to illustrate the issue of skew in very high IF journal cites, rather than an argument that I actually thought it was a great way to assess science quality, if you take my point.

    Like

  5. whimple Says:

    The goal is to disabuse the TruIFBeliever of the conceit that IF means what they think it does.

    This isn’t going to happen. That’s why they are TruBelievers. I don’t really see the harm. You know the T.B’s like to look at IF’s. You know what the IF’s are for the journals in which you’d consider publishing. So, publish (if you can) in the higher IF journals and everyone’s happy. …except that is, for the journals that think their IF is underrated 🙂

    Like

  6. drugmonkey Says:

    whimple are you really not conversant with the scenarios in which one’s career progress is affected by the IF of journals in which you publish? The suggestion that the mere presence of one (or “N”) CNS paper(s) on the CV is a gatekeeper of tenure for example? These things come in subtler forms too. In some cases the gestalt perception that you publish in journals with IF around X or X+4 makes a big, big difference. It is relevant to grant review too you know.

    Like

  7. whimple Says:

    Look DM, you know what the standards are. So meet them or get out. If you’re at a place that requires a CNS paper, you damn well better get one or the consequences are going to be predictably unpleasant. That standard might be stupid, or shortsighted, or unfair, or irrelevant, but it isn’t going to be productive to complain along the lines of, “I know I published in a IF 7.0 journal but really, it’s a great paper and could easily be in a IF 11.0 journal.” Sure the system sucks, and the requirements are capricious, and luck plays a significant role, but at least the system is transparent. Making ISI out to be the boogeyman, although a fun thing to do, and maybe even true, isn’t going to be very productive.

    Like

  8. whimple Says:

    And you might not be a whiny journal shill, but Coturnix definitely is. 🙂

    Like

  9. drugmonkey Says:

    Whimple as we discussed on MWE&G, sometimes when you keep banging your head against the wall, the wall cracks a bit. throwing one’s hands up and saying “well what can I do about it” is just not my style. this doesn’t mean that I expect the world to change 180deg or else I’ve “failed” in whatever I’m trying to do. I probably have more acceptance and understanding of the IF game than you suspect when it comes to my own conduct, career approaches and expectations.

    the larger, issue, of course is that I think that IF pursuit, CNS type research is assertively bad for science. if you don’t give a crap about any given individual’s career, including your own, you should at least care about the body of science… My view anyway

    Like

  10. drugmonkey Says:

    coturnix and bill and the rest of the OA passionista intrigue me. there are many principles that I should agree with in theory but I have a lot of knee-jerk anti. So I read their arguments, trying to come to a conclusion on where I should be on this. Whether I should ever publish in OA journals, would I consider going to any version of OA science, do a Rosie-Redfield like thing for my trainees, etc. If it has a chance to benefit science, well, I’m prepared to listen. even if they do get a little shrill!

    Like

  11. whimple Says:

    I think we’re in general agreement. My specific point is that one shouldn’t get so caught up in trying to change the system that one winds up being eliminated from it. In other words, go ahead and advocate (gently) for change, but meanwhile pragmatically operate under the working assumption that The System isn’t going to change any time soon.

    Like

  12. physioprof Says:

    Shorter whimple: Down with the man; hey man, whatchu got?

    Like

  13. whimple Says:

    Yeah, that is shorter. 🙂

    Like

  14. neurolover Says:

    What struck me is that you’re complaining about the inaccuracy of a for-profit ranking scheme. They’re going to keep doing it wrong as long as it’s profitable for them. How to make it unprofitable? stop using the rankings until they are fixed to the point where we think they measure what we want them to measure (which I guess is the merit of the work of individuals).

    Whimple seems to be pointing out that one can’t stop using the info in the form of which journals one publishes in (because that would kill one’s career). That may be true. But, the way that we have to stop using the rankings is when we do the evaluations that rely on them. DM’s saying that they’re used in awarding grants/tenure. We need to stop using them there (when we have the power). If we do, they’ll go away.

    I believe there are subfields where this is done, subfields where the power-brokers have decided they’re not going to use the differences in rankings independent of their assessment of the journal/article. How successful that can be depends on the role the power-brokers play in the larger field of science. It can be done, but it has to be done by the Man. Becoming the Man and then hanging on to the way things worked for you is what perpetuates the flawed practices.

    Like

  15. drugmonkey Says:

    “Becoming the Man and then hanging on to the way things worked for you is what perpetuates the flawed practices.”

    Becoming The Man by playing the game as it is does not mean that one inevitably loses sight of one’s convictions. I had something on this before, in response to what I saw as a tendency for useless ranting on blogs. Not that I don’t do that here. But I was trying to explore the fact that while ranting feels good and has a place, “becoming the Man” (or at least on track for this) is what gets you into position to really do something effective.

    A question I will always have is, how do we avoid becoming ThatGuy we currently despise? It gets into interesting questions as to whether people as they become InfluentialScientists adopt the same old approaches because 1) that’s who they always were and the system selected them accordingly 2) they forget their old convictions because those were really based on personal interests anyway, interests that shift with career progression or 3) they see the error of their (previous) ways…

    Like


  16. Thomson Scientific Corrects Inaccuracies In Editorial

    Article Titled “Show me the Data”, Journal of Cell Biology, Vol. 179, No. 6, 1091-1092, 17 December 2007 (doi: 10.1083/jcb.200711140) is Misleading and Inaccurate

    http://www.scientific.thomson.com/citationimpactforum/

    Like

  17. physioprof Says:

    Holy shit! The ISI borg reads DrugMonkey! W00t!!

    Like

  18. drugmonkey Says:

    interesting defense there. Mostly it is arguing about the tone taken by the JCB piece, fair enough. The point about refusing to negotiate with editors about the categorization of content gives us an alternative hypothesis. Namely that journals look at the way ISI decides on “front material” and makes sure to adapt their journal practices to produce the outcome desired. arguable.

    I still think they are totally bogus on the mean vs median thing though. Isn’t it basic stat inference that when you have a skewed distribution the median is considered more representative and therefore “correct” as a description? the point is not whether it would increase or decrease the IF number, but whether it would be more accurate!

    Like

  19. physioprof Says:

    “the point is not whether it would increase or decrease the IF number, but whether it would be more accurate!”

    I bet the reason they don’t want to use the median is that there would be a very large number of journals with impact factors of zero, because more than half of all papers they publish have never been cited.

    Like

  20. drugmonkey Says:

    “more than half of all papers they publish have never been cited.”

    while you are no doubt correct I have trouble getting my head around this. really. never cited. ever? even by themselves? this latter brings up another question which is if you could get a per-group or per-PI number what would be the modal number of pubs. has to be greater than 1, doesn’t it?

    Like

  21. physioprof Says:

    “while you are no doubt correct I have trouble getting my head around this. really. never cited. ever? even by themselves?”

    Don’t forget that IF is only based on a two-year window of citations.

    “this latter brings up another question which is if you could get a per-group or per-PI number what would be the modal number of pubs. has to be greater than 1, doesn’t it?”

    If you mean “has to be” as a matter of logical necessity, then no; as a practical matter, of course. But I’d bet that the modal number of publications per unique author is one, just as it is for SfN abstracts.

    Like

  22. drugmonkey Says:

    “Don’t forget that IF is only based on a two-year window of citations.”

    good point. but dangit, now you’ve raised another issue. which may be very significant in determining a journal’s ongoing IF rank.

    Journals can differ by a LOT in terms of publication lag, I’ve seen 12 mo from acceptance to print in some places. About 3-5 mo might be typical of most journals and of course the glamor mags run maybe a few weeks. I can see problems arising from the perspective of both the cited and citing papers. A long-lag journal is potentially going to both have its own IF lowered and have less influence on the IF of other journals.

    From the perspective of the cited paper, the rapid pre-print thing may be helpful in theory since the paper is essentially citable within a month or two of acceptance. but I’ve noticed that things get wiggy in ISI land under these circumstances because the method of citing is less controlled, the citing authors may just respond to proof query with less than the right cite, etc. In short, even if a pre-print article is cited, it is much more likely to get missed in the ISI calculations.

    From the perspective of the citing paper, from what I can tell, ISI only counts cites once the paper has been print-published. So if you have a 12 mo lag in publication, the contribution to journals IF is automatically cut in half, i.e., limited to the year prior to submission of the final version of the MS.

    seems like one easy thing for journals looking to move-on-up the IF charts would be to get the acceptance-to-print lag down as far as conceivably possible…

    Like

  23. Piled Higher, Deeper Says:

    You avoid becoming The Man by not dumping on your postdocs for one thing.

    Like

  24. physioprof Says:

    DM: Your math is wrong. There is always a continuous flow of papers moving through the system, so it doesn’t matter how long the publication lag is: it just time shifts the submission/acceptance dates of the papers that go towards the IF computed for a given year.

    PHD: If you “dump” on your post-docs, you are greatly reducing the chances that you will become The Man. The best way to become The Man is to treat your post-docs well, so that they are as productive as possible.

    Like

  25. physioprof Says:

    BTW, just to elaborate on DM’s point: Shortening publication lag probably does improve IF, but it is an indirect effect. People like to submit to journals with shorter lags, and so–all else being equal–journals with shorter lags get better work submitted to them. I believe that one of the most important things that Gary Westbrook did to dramatically increase the IF of JoN during his tenure as Editor in Chief was to dramatically decrease publication lag.

    Like

  26. Neuro-conservative Says:

    Physio — Huh?
    –from ISI Impact factor trend graph for JoN:
    2002: 8.045
    2003: 8.386
    2004: 7.907
    2005: 7.506
    2006: 7.453

    Like

  27. drugmonkey Says:

    “DM: Your math is wrong. There is always a continuous flow of papers moving through the system, so it doesn’t matter how long the publication lag is”

    maybe you could ‘splain this a bit more?

    What I’m thinking is that the most-recent citations contained in a 12-mo lag journal by definition are one year (or a bit more) old. So the cites coming from those articles that are older than the one-year (or a little less) interval preceding final manuscript submission are not going to figure in any IF calculation. Right?

    One point I didn’t make is that if lots of journals in a sub-area have long lags or if a given journal has a large presence, this is going to alter IF within the subfield.

    Not to mention, a journal’s IF include citations from the journal itself, effects of pub-lag should be obvious in this case.

    see this document for a review. money quote “there are journals where the observed rate of self-citation is a dominant influence in the total level of citation.” Although this analysis concludes that the overall correlation of IF with (journal level) self-citedness is minimal, the devil would be in the specific details if you ask me. Like when comparing the handful of journals in subfield X with those of subfield Y when assessing job candidates, promotion, etc.

    Page 11 has a breakdown of “neuroscience” journals, btw. there are some fairly intriguing outliers, like the 85% self-cite rate journal down near the bottom (non English, perhaps?) and the 7,000 self-cite journal in the top 20.

    Like

  28. writedit Says:

    Of possible interest in Nature this week:

    Free journal-ranking tool enters citation market
    Database offers on-the-fly results.

    Declan Butler

    A new Internet database lets users generate on-the-fly citation statistics of published research papers for free. The tool also calculates papers’ impact factors using a new algorithm similar to PageRank, the algorithm Google uses to rank web pages. The open-access database is collaborating with Elsevier, the giant Amsterdam-based science publisher, and its underlying data come from Scopus, a subscription abstracts database created by Elsevier in 2004.

    The SCImago Journal & Country Rank database was launched in December by SCImago, a data-mining and visualization group at the universities of Granada, Extremadura, Carlos III and Alcalá de Henares, all in Spain. It ranks journals and countries using such citation metrics as the popular, if controversial, Hirsch Index. It also includes a new metric: the SCImago Journal Rank (SJR).

    The familiar impact factor created by industry leader Thomson Scientific, based in Philadelphia, Pennsylvania, is calculated as the average number of citations by the papers that each journal contains. The SJR also analyses the citation links between journals in a series of iterative cycles, in the same way as the Google PageRank algorithm. This means not all citations are considered equal; those coming from journals with higher SJRs are given more weight. The main difference between SJR and Google’s PageRank is that SJR uses a citation window of three years. See Table 1

    It will take time to assess the SJR properly, experts say. It is difficult to compare the results of SJR journal analyses directly with those based on impact factors, because the databases each is based on are different. Thomson’s Web of Science abstracts database covers around 9,000 journals and Scopus more than 15,000, and in the years covered by the SCImago database — 1996 to 2007 — Scopus contains 20–45% more records, says Félix de Moya Anegón, a researcher at the SCImago group.

    The top journals in SJR rankings by discipline are often broadly similar to those generated by impact factors, but there are also large differences in position (see Table 2). Immunity (SJR of 9.34) scores higher than The Lancet (1.65), for example, but The Lancet ‘s 2006 impact factor of 25.8 is higher than the 18.31 of Immunity . Such differences can be understood in terms of popularity versus prestige, says de Moya Anegón — popular journals cited frequently by journals of low prestige have high impact factors and low SJRs, whereas journals that are prestigious may be cited less but by more prestigious journals, giving them high SJRs but lower impact factors.

    Thomson under fire
    The new rankings are welcomed by Carl Bergstrom of the University of Washington in Seattle, who works on a similar citation index, the Eigenfactor, using Thomson data. “It’s yet one more confirmation of the importance and timeliness of a new generation of journal ranking systems to take us beyond the impact factor,” says Bergstrom, “and another vote in favour of the basic idea of ranking journals using the sorts of Eigenvector centrality methods that Google’s PageRank uses.”

    Thomson has enjoyed a monopoly on citation numbers for years — its subscription products include the Web of Science, the Journal Citation Report and Essential Science Indicators. “Given the dominance of Thomson in this field it is very welcome to have journal indicators based on an alternative source, Scopus,” says Anne-Wil Harzing of the University of Melbourne in Australia, who is developing citation metrics based on Google Scholar.

    Jim Pringle, vice-president for development at Thomson, says metrics similar to PageRank, such as SJR and Eigenfactor, have proven their utility on the web, but their use for evaluating science is less well understood. “Both employ complex algorithms to create relative measures and may seem opaque to the user and difficult to interpret,” he says.

    Thomson is also under fire from researchers who want greater transparency over how citation metrics are calculated and the data sets used. In a hard-hitting editorial published in Journal of Cell Biology in December, Mike Rossner, head of Rockefeller University Press, and colleagues say their analyses of databases supplied by Thomson yielded different values for metrics from those published by the company (M. Rossner et al . J. Cell Biol. 179, 1091–1092 ; 2007).

    Moreover, Thomson, they claim, was unable to supply data to support its published impact factors. “Just as scientists would not accept the findings in a scientific paper without seeing the primary data,” states the editorial, “so should they not rely on Thomson Scientific’s impact factor, which is based on hidden data.”

    Citation metrics produced by both academics and companies are often challenged, says Pringle. The editorial, he claims, “misunderstands much, and misstates several matters”, including the authors’ exchanges with Thomson on the affair. On 1 January, the company launched a web forum to formally respond to the editorial (see http://scientific.thomson.com/citationimpactforum).

    Like

  29. drugmonkey Says:

    Are. You. Ready. To. Ruuuuuuuuuuuuummmmmmmmbbbbbbllllleeeee????!!!????

    Like


  30. THOMSON SCIENTIFIC CORRECTS INACCURACIES IN EDITORIAL

    Article Titled “Show me the Data”, Journal of Cell Biology, Vol. 179, No. 6, 1091-1092, 17 December 2007 (doi: 10.1083/jcb.200711140) is Misleading and Inaccurate

    ….In the same way that a scientist would use binoculars to view animal behavior, but would use a microscope to view cellular behavior, bibliometrics experts use the JCR to observe the journal, and Web of Science™ to observe the individual article/author….

    Read the full response:
    http://scientific.thomson.com/citationimpactforum/8427045/

    Like

  31. physioprof Says:

    Is there some dude in the basement at ISI that posted here, or is it some kind of Web crawler thingamjig?

    Like

  32. whimple Says:

    How about this for a Journal metric?

    (# of manuscripts received) / (# of manuscripts published)

    We could call it the “toughness factor”. 🙂

    I think this is more of the bottom line. Why is a Science paper considered so valuable? Because it’s tough to get one!

    Like

  33. drugmonkey Says:

    “Is there some dude in the basement at ISI that posted here, or is it some kind of Web crawler thingamjig?”

    well, there’s a real-ish looking email address attached and the title would imply a little more than “some dude in the basement” so I’ve been assuming they are taking this effort to engage in the blog dialog seriously. it is not inconceivable that they are doing this with an autobot but, so what? the reply they’ve made on their website is highly informative so I congratulate them on that. all perspectives and all that…

    Like

  34. physioprof Says:

    I wasn’t complaining. I’m just curious.

    Like


Leave a comment