I still don’t understand the calculation of Journal Impact Factor. Or, I didn’t until today. Not completely. I mean, yes, I had the basic idea that it was citations divided by the number of citable articles published in the past two years. However, when I write blog posts talking about how you should evaluate your own articles in the context (e.g., this one), I didn’t get it quite right. The definition from the source:

the impact factor of a journal is calculated by dividing the number of current year citations to the source items published in that journal during the previous two years

So when we assess how our own article contributes to the journal impact factor of the journal it was published in, we need to look at citations in the second and third calendar years. It will never count the first calendar year of publication, somewhat getting around the question of whether something has been available to be seen and cited for a full calendar year before it “counts” for JIF purposes. So when I wrote:

The fun game is to take a look at the articles that you’ve had rejected at a given journal (particularly when rejection was on impact grounds) but subsequently published elsewhere. You can take your citations in the “JCR” (aka second) year of the two years after it was published and match that up with the citation distribution of the journal that originally rejected your work. In the past, if you met the JIF number, you could be satisfied they blew it and that your article indeed had impact worthy of their journal. Now you can take it a step farther because you can get a better idea of when your article beat the median. Even if your actual citations are below the JIF of the journal that rejected you, your article may have been one that would have boosted their JIF by beating the median.

I don’t think I fully appreciated that you can look at citations in the second and third year and totally ignore the first year of citations. Look at the second and third calendar year of citations, individually, or average them together as a short cut. Either way, if you want to know if your paper is boosting the JIF of the journal, those are the citations to focus on. Certainly in my mind when I do the below mentioned analysis I used to think I had to look at the first year and sort of grumble to myself about how it wasn’t fair, it was published in the second half of the year, etc. And the second year “really counted”. Well, I was actually closer with my prior excuse making than I realized. You look at the second and third years.

Obviously this also applies to the axe grinding part of your analysis of your papers. I was speaking with two colleagues recently, different details but basically it boils down to being a little down in the dumps about academic disrespect. As you know Dear Reader one of the things that I detest most about the way academic science behaves is the constant assault on our belongingness. There are many forces that try to tell you that you suck and your science is useless and you don’t really deserve to have a long and funded career doing science. The much discussed Imposter Syndrome arises from this and is accelerated by it. I like to fight back against that, and give you tools to understand that the criticisms are nonsense. One of these forces is that of journal Impact Factor and the struggle to get your manuscripts accepted in higher and higher JIF venues.

If you are anything like me you may have a journal or two that is seemingly interested in publishing the kind of work you do, but for some reason you juuuuuust miss the threshold for easy acceptance. Leading to frequent rejection. In my case it is invariably over perceived impact with a side helping of “lacks mechanism”. Now these just-miss kinds of journals have to be within the conceivable space to justify getting analytical about it. I’m not talking about stretching way above your usual paygrade. In our case we get things in this one particular journal occasionally. More importantly, there are other people who get stuff accepted that is not clearly different than ours on these key dimensions on which ours are rejected. So I am pretty confident it is a journal that should seriously consider our submissions (and to their credit our almost inevitably do go out for review).

This has been going on for quite some time and I have a pretty decent sample of our manuscripts that have been rejected at this journal, published elsewhere essentially unchanged (beyond the minor revisions type of detail) and have had time to accumulate the first three years of citations. This journal is seriously missing the JIF boat on many of our submissions. The best one beat their JIF by a factor of 4-5 at times and has settled into a sustained citation rate of about double theirs. It was published in a journal with a JIF about 2/3rd as high. I have numerous other examples of manuscripts rejected over “impact” grounds that at least met that journal’s JIF and in most cases ran 1.5-3x the JIF in the critical second and third calendar years after publication.

Fascinatingly, a couple of the articles that were accepted by this journal are kind of under-performing considering their conceits, our usual for the type of work etc.

The point of this axe grinding is to encourage you to take a similar quantitative look at your own work if you should happen to be feeling down in the dumps after another insult directed at you by the system. This is not for external bragging, nobody gives a crap about the behind-the-curtain reality of JIF, h-index and the like. You aren’t going to convince anyone that your work is better just because it outpoints the JIF of a journal it didn’t get published in. Editors at these journals are going to continue to wring their hands about their JIF, refuse to face the facts that their conceits about what “belongs” and “is high impact” in their journal are flawed and continue to reject your papers that would help their JIF at the same rate. It’s not about that.

This is about your internal dialogue and your Imposter Syndrome. If this helps, use it.

A semi-thread from frustrated bioinformaticians emerged on twitter recently. In it they take shots at their (presumably) collaborators who do not take their requests for carefully curated and formatted data to heart.

Naturally this led me to taunt the data leech OpenScienceEleventy waccaloons for a little bit. The context is probably a little different (i.e., it seems to reference established collaborations between data-generating and data-analyzing folks) but the idea taps on one of my problems with the OpenScience folks. They inevitably don’t just mean they want access to the data that went into your paper but ALL of your data related to it. Down to the least little recorded unit (someone in the fighty thread said he wanted raw electrophysiological recording to test out his own scoring algorithm or some such). And of course they always mean that it should be nicely formatted in their favorite way, curated for easy understanding by computer (preferably) and, in all ways, the burden should be on the data-generating side to facilitate easy computational analysis. This is one of the main parts that I object to in their cult/movement- data curation in this way comes with a not-insubstantial cost expended to the benefit of some internet random. I also object on the basis of the ownership issues, bad actors (think: anti-science extremists of various stripes including right wing “think tanks” and left wing animal right terrorists), academic credit, opportunity loss among other factors.

However, the thought of the day is about data curation and how it affects the laboratory business and my mentoring of science trainees. I will declare that consistent data collation, curation, archiving and notation is a good thing for me and for my lab. It helps the science advance. However, these things come at a cost. And above all else when we consider these things, we have to remember that not every data point collected enters a scientific manuscript or is of much value five or ten years down the line. Which means that we are not just talking about the efficient expenditure of effort on the most eventually useful data, we’re talking about everything. Does every single study get the full data analysis, graphical depiction and writeup? Not in my lab. Data are used at need. Data are curated to the extent that it makes sense and sometimes that is less than complete.

Data are collected in slightly different ways over time. Maybe we changed the collection software. Maybe our experiments are similar, but have a bit of a tweak to them. Maybe the analyses that we didn’t think up until later might be profitably applied to earlier datasets but…..the upside isn’t huge compared to other tasks. Does this mean we have to go back and re-do the prior analyses with the current approach? If we want to, this sometimes that requires that third and fourth techniques (programs, analysis strategies, etc) be created and applied. This comes with additional effort costs. So why would we expend those efforts for something? If there was interest or need on the part of some member of the laboratory, sure. If a collaborator “needs” that analysis, well, this is going to be case by case on the basis of what it gains us, the collaboration or maybe the funded projects. Because it all costs. Time, which is money, and the opportunity cost of those staff members (and me) not doing other tasks.

Staff members. Ah, yes, the trainees. I am totally supportive of academic trainees who want to analyze data and come up with new ways to work with our various stock-in-trade data sets and archive of files. This, btw, is what I did at one of my postdoctoral stops. I was working with a model where we were somewhat captive to the rudimentary data analyses provided by the vendor’s software. The data files were essentially undocumented, save for the configuration data, dates and subject identifiers. I was interested in parsing the data in some new ways so I spent a lot of time making it possible to do so. For the current files I was collecting and for the archive of data collected prior to my arrival and for the data being collected by my fellow trainees. In short, I faced the kind of database that OpenData people claim is all they are asking for. Oh, just give us whatever you have, it’s better to have anything even if not annotated, they will claim. (Seriously). Well, I did the work. I was able to figure out the data structure in the un-annotated files. This was only possible because I knew how the programs were working, how the variables could be set for different things, what the animals were doing in a general sense in terms of possible responses and patterns, how the vendor’s superficial analysis was working (for validation), what errors or truncated files might exist, etc. I wrote some code to create the slightly-more-sophisticated analyses that I happened to dream up at the time. I then started on the task of porting my analysis to the rest of the lab. So that everyone from tech to postdoc was doing initial analysis using my programs, not the vendor ones. And then working that into the spreadsheet and graphing part of the data curation. And THEN, I started working my way back through the historical database from the laboratory.

It was a lot of work. A lot. Luckily my PI at the time was okay with it and seemed to think I was being productive. Some of the new stuff that I was doing with our data stream ended up being included by default in most of our publications thereafter. Some of it ended up in its own publication, albeit some 12 years after I had completed the initial data mining. (This latter paper has barely ever been cited but I still think the result is super cool.) The data mining of files from experiments that were run before I entered the laboratory required a second bit of work, as you might readily imagine. I had to parse back through the lab books to find out which subject numbers belonged together as cohorts or experiments. I had to separate training data from baseline / maintenance studies, from experimental manipulations of acute or longitudinal variety. And examine these new data extractions in the context of the actual experiment. None of this was annotated in the files themselves. There wasn’t really a way to even do it beyond 8 character file names. But even if it had been slightly better curated, I’m just not seeing how it would be useful without the lab books and probably some access to the research team’s memory.

Snapping forward to me as a PI, we have somewhat similar situation in my lab. We have a behavioral assay or two run by proprietary commercial software that generate data files that could, in theory, be mined by anyone that was interested* in some aspect of the behavior that struck their fancy. It would still take a lot of work and at least some access to the superordinate knowledge about the studies a given subject/date stamped file related to. I am happy for trainees in my lab to play with the data files, present and past. I’m happy for them to even replace analysis and reporting strategies that I have developed with their own, so long as they can translate this to other people in the lab. I.e., I am distinctly unkeen on the analysis of data being locked up in the proprietary code or software on a single trainee’s laptop. If they want to do that, fine, but we are going to belt-and-suspenders it. There is much value in keeping a set of data analysis structures more or less consistent over time. Sometimes the most rudimentary output from a single data file (say, how many pellets that rat earned) is all that we need to know, but we need to know that value has been used consistently across years of my work.

I have at least two interests when it comes to data curation in my lab. I need some consistency and I need to be able to understand as the PI what I am looking at. I need to be able to go back to some half-remembered experiment and quickly whip up a preliminary data or slide figure. This leans towards more orthodoxy of analysis. Towards orthodoxy of data structures and formats. Towards orthodoxy in the graphs, for pete’s sake. My attempts to manage this into reality has mixed results, I will note. At the level of an individual staffer, satisfying some data curation goal of the PI (or anyone else, really) can seem like make-work. And it is definitely work to the ends of someone else, I just happen to be the PI and am more equal that anyone else. But it is work. And this means that short cuts are taken. Often. And then it is down to the effort of someone to bring things back up to curation standard. Sure it may seem to be “just as easy” for the person to do it the way I want it, but whaddayaknow, they don’t always see it that way. Or are rushed. Or mean to get to that at the end of the study but then forget. Tomorrow. When it is really needed.

I get this. It is a simple human reality.

In my lab, I am the boss. I get to tell staff members what to do and if they won’t do it, eventually, I can fire them. Their personal efforts (and mine for that matter) are supposed to be directed towards the lab good, first, and the institutional good second. The NIH good is in there somewhere but we all know that since a grant is not a contract, this is a very undefined concept.

There is very little that suggests that the effort of my laboratory staff has to be devoted to the good of some other person who wants access to our data in a way that is useful to them. In fact, I am pretty sure in the extreme case that if I paid a tech or trainee from my grant to work substantial amounts of time on a data analysis/curation project demanded of us by a private for-profit company solely for their own ends, this would violate the rules. There would probably be a technical violation if we did the same for a grant project funded to another researcher if the work had nothing whatever to do with the funded aims in my own lab that were paying the staff member’s salary.

Data curation for others’ ends costs. It costs time and that means that it costs money. It is not trivial. Even setting up your data stream within lab so that it could possibly be easier to share with external data miners costs. And the costs apply to all of the data collected, not just that that eventually, one day is requested of you and ends up in a paper.

__

*as it happens we just fielded a request but this person asked us to collaborate, rightfully so.

An alert from the Twitters points us to the author guidelines for JAMA. It appears to be the instructions for all JAMA titles even though the below tweet refers specifically to JAMA Psychiatry.

The critical bit:

Public dissemination of manuscripts prior to, simultaneous with, or following submission to this journal, such as posting the manuscript on preprint servers or other repositories, is discouraged, and will be considered in the evaluation of  manuscripts submitted for possible publication in this journal. The evaluation will involve making a determination of whether publication of the submitted manuscript will add meaningful new information to the medical literature or will be redundant with information already disseminated with the posting of the preprint. Authors should provide information about any preprint postings, including copies of the posted manuscript and a link to it, at the time of submission of the manuscript to this journal

JAMA is not keen on pre-prints. They threaten that they will not accept manuscripts for publication on the basis that it is now “redundant” with information “already disseminated”. The require that you send them ” Copies of all related or similar manuscripts and reports (ie, those containing substantially similar content or using the same or similar data) that have been previously published or posted, or are under consideration elsewhere must be provided at the time of manuscript submission “.

Good luck with that JAMA. I predict that they will fight the good fight for a little while but will eventually cave. As you know, I think that the NIH policy encouraging pre-prints was the watershed moment. The fight is over. It’s down to cleaning out pockets of doomed resistance. Nothing left but the shouting. Etc. NIH grant success is a huge driver of the behavior of USA biomedical researchers. Anything that is perceived as enhancing the chances of a grant being awarded is very likely to be adopted by majorities of applicants. This is particularly important to newer applicants because they have fewer pubs to rely upon and the need is urgent to show that they are making progress in their newly independent careers. So the noobs are establishing a pre-print habit/tradition for themselves that is likely to carry forward in their careers. It’s OVER. Preprints are here and will stay.

My prediction is that authors will start avoiding peer-reviewed publication venues that discourage (this JAMA policy is more like ‘prevent’) pre-print communication. The more prestigious journals can probably hold out for a little while. If authors perceive a potential career benefit by being accepted in a JAMA title that is higher than the value of pre-printing, fine, they may hesitate until they’ve gotten rejected. My bet is that on the main, we will evolve to a point where authors won’t put up with this. There are too may reasons, including establishment of scientific priority, that will push authors away from journals which oppose pre-prints.

Anyone else know of other publishers or journal titles that are aggressively opposing pre-prints?

A twitter observation suggests that some people’s understanding of what goes in the Introduction to a paper is different from mine.

In my view, you are citing things in the Introduction to indicate what motivated you to do the study and to give some insight into why you are using these approaches. Anything that was published after you wrote the manuscript did not motivate you to conduct the study. So there is no reason to put a citation to this new work in the Introduction. Unless, of course, you do new experiments for a revision and can fairly say that they were motivated by the paper that was published after the original submission.

It’s slightly assy for a reviewer to demand that you cite a manuscript that was published after the version they are reviewing was submitted. Slightly. More than slightly if that is the major reason for asking for a revision. But if a reviewer is already suggesting that revisions are in order, it is no big deal IMO to see a suggestion you refer to and discuss a recently published work. Discuss. As in the Discussion. As in there may be fresh off the presses results that are helpful to the interpretation and contextualization of your new data.

These results, however, do not belong in the Introduction. That is reserved for the motivating context for doing the work in the first place.

The Chan Zuckerberg Initiative recently announced that it was setting up a program intended to increase the recruitment and retention of underrepresented students in STEM. This will be launched as a partnership between the University of California San Diego and Berkeley campuses and the University of Maryland Baltimore County. UMBC has a Meyerhoff Scholars Program (launched in 1988) which focuses on undergraduate students that CZI intends to duplicate at UCB and UCSD.

There’s no indication of a pre-existing prototype at UCB, however the UCSD version will apparently leverage an existing program (PATHS) led by neurobiologist Professor Gentry Patrick.

Under the new CZI collaboration, announced at an April 9 press conference, UC San Diego, UC Berkeley and the University of Maryland, Baltimore County (UMBC) will work toward a goal of replicating aspects of UMBC’s Meyerhoff Scholars Program, recognized as one of the most effective models in the country to help inspire, recruit and retain underrepresented minorities pursuing undergraduate and graduate degrees in STEM fields. UMBC is a diverse public research university whose largest demographic groups identify as white and Asian, but which also graduates more African-American students who go on to earn dual M.D.-Ph.D. degrees than any other college in U.S.—a credit to the Meyerhoff program model.

I think this is great. Now, look, yes the Meyerhoff style program and this new CZI mimic are both pipeline solutions. And you know perfectly well, Dear Reader, that I am not a fan of pipeline excuses when it comes to the NIH grant award and University professor hiring strategies. Do not mistake me, however. I am still a fan of efforts that make it easier to extend fair opportunity for individuals from groups traditionally underrepresented in the academy, and in STEM fields in particular. I also have a slight brush of experience with what UMBC is doing in terms of encouraging URM students to seek out opportunity for research training. I conclude from this that they are doing REALLY great things in terms of culture on that campus. I would very much like to see that extended to other campuses and this CZI thing would seem to be doing that. Bravo.

So what might be the vision here? Well that all depends on how serious CZI is about this and how much money they have to spend. The Meyerhoff program has a Graduate Fellows wing to support and encourage graduate students. This would be the next step in the pipeline, of course, but hey why not? We’ve just reviewed a SfN program which was only able to be extended to 20% of URM applicants. I would imagine the total amount of URM graduate student support is also less than that needed for most applicants and therefore more graduate fellowships would be welcome. But what about REALLY moving the needle? What could CZI do?

The Science wing of CZI, headed by Cori Bargmann, is setting out to “cure, prevent, or manage all diseases“. Right up at the top of the splash page it talks about the People who “move the field forward“. They go on to say under their approach to supporting projects that they “believe that collaboration, risk taking, and staying close to the scientific community are our best opportunities to accelerate progress in science”. Risk taking. Risk taking. There is one thing that supports risk taking and that is significant and stable research funding. This is something that the Ginther report identified as a particular problem for PIs from some underrepresented groups. It is for certain sure a player in many people’s research program trajectories.

So I’m going to propose that CZI should set their sights on creating a version of what HHMI is doing with membership limited to PIs from underrepresented groups, broadly writ. It is up to them how they want to box this in, of course, but the basic principle would be to give stable research support to those who are less readily able to achieve that because of various biases in, e.g., NIH grant review and selection as well as in the good old boys club that is the HHMI.

The Society for Neuroscience recently twittered a brag about it’s Neuroscience Scholars program.

It was the “38 years” part that got me. That is a long time. And we still do not have anything like robust participation of underrepresented* individuals in neuroscience. This suggests that particularly when it comes to “career growth” goals of this program, it is failing. I stepped over to the SfN page on the Program and was keen to see outcomes, aka, faculty. Nothing. Okay, let’s take peek at the PDF brochure reviewing the 30 year history of the program. I started tweeting bits in outrage and then….well, time to blog.

First off, the brochure says the program is funded by the NIH and has been from the outset “.. SfN has received strong support and funding from the NIH, starting in 1982 with funding from what was then the National Institute for Neurological and Communicative Disorders and Stroke (NINCDS). … with strong and enduring support from NIH, in particular NINDS, the NSP is recognized as one of the most successful diversity programs“. Oh? Has it accomplished much? Let’s peer into the report. “Since the first 8 participants who attended the 1981 and 1982 SfN annual meetings, the program has grown to support a total of 579 Scholars to date. During that time, the NSP has helped foster the careers of many successful researchers in neuroscience.” Well we all know foster the careers is nice pablum but it doesn’t say anything about actual accomplishment. And just so we are all nice and clear, SfN itself says this about the goal “The NSP’s current overall goal is to increase the likelihood that diverse trainees who enter the neuroscience field continue to advance in their careers — that is, fixing the “leaky pipeline.”” So yes. FACULTY.

[Sidebar: And I also think the funding by the NIH is plenty of justification for asking about grant success of any Scholars who became faculty, but I don’t see how to get at that. Related to this, I will just note that the Ginther report came out in 2011, 30 years after those “first 8 participants” attended the 1981 SfN meeting.]

Here’s what they have to offer on their survey to determine the outcomes from the first 30 years of the program. “The survey successfully reached 220 past Scholars (approximately 40 percent) and had a strong overall response rate (38 percent, n=84).” As I said on Twitter: ruh roh. Survivor bias bullshit warning alert…….. 84 out of 579 Scholars to date means they only reached 14.5% of their Scholars to determine their outcome. And they are pretty impressed with themselves. “Former Scholars have largely stayed within academia and achieved high standing, including full professorships and other faculty positions.” “Largely”. Nice weasel wording.

And more importantly, do you just maaaaaybe think this sample of respondents is highly frigging enriched in people who made it to professorial appointments and remain active neuroscientists? Again, this is out of the 38% responding of the 40% “reached”, aka 84/579 or 14.5% of all Scholars. And let’s just sum up the pie chart to assess their “largely” claim. I make it out to be that 50 of these scholars are in professorial appointments, this is only 8.6% of the total number of Scholars assisted over 30 years. Another 4 (0.7%) are listed separately as department heads. This does not seem to me to being a strong fix of the supposed leaky pipeline.

Now, as a reminder this is 8.6% out of an already highly selected subset of the most promising underrepresented burgeoning neuroscientists. The SfN brags about how highly competitive the program is “A record 102 applicants applied in 2010 for 20 coveted slots.” RIGHT? So the hugely leaky pipeline of Scholars reporting back for their program review purposes (38% responding of 40% “reached”) is only reflecting the leaking AFTER they’ve already had this 1/5 selection. What about the other 80%? Okay so let’s take their faculty plus department head numbers, multiply by the 0.2% selection factor from their applicant pool (don’t even get me started about those URM trainees who never even apply)…1.86%.

Less than 2%. That’s it?????? That sounds exactly like the terrible numbers of African American faculty in science departments to me. And note, the SFN says it’s program has since 1997 enrolled 48% Hispanic/LatinX and 35% Black/African-American Scholars. So we should be focusing on the total URM faculty numbers. I found another SfN report (pdf) showing there were 1% African-American and 5% Hispanic/Latinx faculty in US neuroscience PhD programs (2016).

This SfN program is doing NOTHING to fix the leaky pipeline problem from what their numbers are telling us.

___

*I shouldn’t have to point this out but African-Americans constitute about 12.7% of the US population, and Hispanic/Latinx about 17.8%.