Do Appearances Indicate Anything About the Quality of the Data?

May 20, 2008

I’m mired in an effort to respond to a recent post of drdrA’s over at Blue Lab Coats on the deceptively simple issue of a manuscript rejection. This post is apparently a rich vein of blog fodder because PhysioProf already responded to the issue of pleading one’s case with the editor and I am trying to follow up on BugDoc’s request to expand on a comment I posted at Blue Lab Coats. That effort is bogging down so I thought I’d take care of one little nugget.
The part of drdrA’s post which made the most steam come out of my ears was the following point advanced by one of the paper reviewers:

“poorly performed gels and western blots which need to be improved.”


Now, I am not privy to the details of this manuscript and critique other than what drdrA has offered. So I will have to make a few assumptions based on what I gather from people for whom “gels and western blots” form a significant fraction of their data. I assume that what the reviewer is talking about is that the figures in the paper do not appear “clean” in some way. Perhaps the background is higher than one might like. The bands perhaps not sufficiently defined. Maybe the gel became distorted so that the bands and lanes do not line up in a nice Cartesian orientation. In short, the figures do not look pretty to the eye. I gather from my colleagues that the ability to produce pretty figures is taken as a point of pride in one’s “hands”. This I can understand. I also understand that the visual quality is taken as an indicator of the scientific quality or veracity and that, as with drdrA’s reviewer’s comment, this visual quality is relevant to publication decisions. I find this idea idiotic.
I will note that this trend can sometimes be observed in reviewer comments on other types of data so I’ll not need to go into a rant as to why a N=1 like a gel constitutes “data”. Take a generalized case of a graph which includes an indicator of central tendency of a group of subjects (the mean) with error bars which describe the variability around that central tendency (such as the standard error of the mean or SEM). I have seen cases in which reviewers of such figures make the fundamental error of interpreting the statistical reliability of the effect being described by the much vaunted inferential technique of “eyeballing the error bars”. This is an error similar to the request to “improve” the visual appearance of a gel.
Don’t get me wrong. In many cases the formal inferential statistical methods will coincide with the impression the experienced reader gets from the aforementioned eyeball technique. So it is a decent proxy and most people use it to some extent- you go straight to the figures first, do you not? The two inferential techniques do not always coincide, however; this is particularly liable to FAIL in a repeated measures design. And of course the standard of one person’s opinion about how big the error bars should be relative to the difference between means is an inherently arbitrary (and thus variable) one. Therefore, suggestions that one doubts the statistics because the error bars seem too large (I’ve seen this more often than I can believe) are wrong. Suggestions to run more subjects merely to decrease the size of the error bars and thus improve the visual appearance of the figure are wrong (not to mention a clear violation of the animal-use dictum to reduce the number of subjects, if it is animal data).
Now, the reviewer request for pretty data is to some extent only a minor annoyance. One can always point out to the editor in a rebuttal that what is important are the statistics or the meaning of the gel itself. Until one thinks about what this means for the conduct of science. I get this nasty feeling that an obsession with how the data look, over what they mean, is an encouragement to fake data. In the case of gels, an encouragement to keep re-running the experiment, tweaking conditions until one gets a perfect figure. Without, of course, any description of how many repetitions it took to get that one figure nor any description of the experimental variability. In the case of group mean data, an encouragement to toss outliers, to tweak inclusion criteria or treatment conditions. The impetus to use a more-than-sufficient number of animal subjects is not exactly fraud but does contravene the “reduction” principle of animal use.
DearReaders, perhaps you can assuage my concerns by explaining why better looking data are also higher quality data?

35 Responses to “Do Appearances Indicate Anything About the Quality of the Data?”

  1. JSinger Says:

    I gather from my colleagues that the ability to produce pretty figures is taken as a point of pride in one’s “hands”. This I can understand. I also understand that the visual quality is taken as an indicator of the scientific quality or veracity and that, as with drdrA’s reviewer’s comment, this visual quality is relevant to publication decisions. I find this idea idiotic.
    Starting with the latter: one invaluable lesson from my grad school adviser is that the aesthetic quality of your data do matter. There are a lot of things that can go wrong, and in a perfect-looking gel you can be confident that none of them happened. When your gels look hideous because you can’t mix buffer, use a camera or remember to turn the run off, you don’t pick up the warning signs of, say, sample degradation.
    That said, I think the former is a perfectly legitimate point for reviewers to raise. There’s a general level of professionalism one should be expected to maintain, out of respect for the journal and its readers, if not for yourself. If you were reviewing a paper and saw a graph with 70% gray lines on a 50% gray background, you’d object, right? Why not do the same for a photo that looks like it came out of a 1982 issue of Nucleic Acid Research?

    Like

  2. Bayman Says:

    DearReaders, perhaps you can assuage my concerns by explaining why better looking data are also higher quality data?
    The same reason a grant application laden with typos and covered in coffee stains is more likely to be rejected even if the content is solid. It smacks of the author’s general incompetency when it comes to even the most basic scientific skills and arouses skeptical concern. (I think I read that in a DrugMonkey post somewhere…)

    Like

  3. DrugMonkey Says:

    There are a lot of things that can go wrong, and in a perfect-looking gel you can be confident that none of them happened. When your gels look hideous because you can’t mix buffer, use a camera or remember to turn the run off, you don’t pick up the warning signs of, say, sample degradation.
    I do understand that this is a continuum at one end of which the “bad looking” gel is really a poorer scientific quality gel in which the results are not obvious, there are concerns about the experimental prep actually resulting in what is purported to be demonstrated.
    With a little bit of good faith, however, I tend to suspect that what drdrA was referring to was on the other end of the scale. The difference between perfect/beautiful and acceptable-if-ugly.
    now your line-graph example is interesting because it gets into exactly my problem. I would draw the analogy to running more subjects or subject groups until one gets the desired “pretty” result. Not with simply re-drawing the graph until it is visually pleasing. This latter does not change the data or collection thereof one bit. My assumption is that since the result is unknown until one runs the gel, and the interpretation depends on where that band sits, that this is more like the empirical outcome than a mere description of that outcome.
    what am I missing?

    Like

  4. Bayman Says:

    Plus, no one has time to read through your messy data in a publication just because you couldn’t be bothered to do a tidy job. Would you rather the reader spend his/her time trying to understand what the fuck you’re trying to do, or internalizing and considering the implications of what is the obvious, inescapable and significant conclusion made clear by your beautiful gel?

    Like

  5. per Says:

    i am bemused by your comparison. As you point out, an eyeball of a figure falls a long way short of a statistical analysis for all sorts of reasons, and this doesn’t add up to a coherent analysis.
    However, defective gels/ westerns can be defective data, and they can be defective for all sorts of reasons. Without seeing the original data and the commentary, we cannot tell if the reviewer had a valid point, or whether their comment is ridiculous.
    But i have just rejected a paper for bad westerns, and I had no compunction in doing so. The paper rested on the westerns, and if you cannot in good faith interpret the westerns, how can you interpret all the findings that sit on that analysis ?
    per

    Like

  6. BugDoc Says:

    As a reviewer, I think it’s important to distinguish between messy westerns that can’t be interpreted (BAD), and westerns that are just, well, dirty (REALITY). Really good antibodies are a fabulous tool for molecular and cell biology, however there are many antibodies out there that just are not that great and give you multiple bands on an immunoblot. As long as they can show a clear band and have the appropriate specificity and loading controls, should the investigators spend hundreds or thousands of dollars making new antibodies or doing affinity purification for a few experiments if the answer is already clear?
    I’m with DrugMonkey on this one; assuming all the appropriate controls have been done and you don’t have to squint sideways to see “the band”, some background on the blots doesn’t bother me. After all, we are trying to move science forward, not trying to win a beauty contest, right?
    Having said that, there is no excuse for general sloppiness or poorly laid out figures. Those sorts of things are well within our control and should be as aesthetically pleasing as possible.

    Like

  7. drdrA Says:

    I’m loving this discussion. Especially since we went to enormous lengths to get those gels- and they are difficult – they are not simple protein gels. I have had people walk up me and my students posters containing the very same pictures of gels that I submitted to the journal and ask me how we get such beautiful gels, and actually ask me for the details of the technique. So ???
    The Western blots- now that’s another thing. Suffice to say that there are some conditions, with low abundance proteins that aren’t so easy to get a ‘beautiful’ western. That doesn’t mean the ugly western isn’t perfectly correct- especially if you several other pieces of good evidence to back it up.
    But- I think I made another error- which I will not make again. The figures I submitted were larger than the way these would actually be presented in the published paper itself. – and almost everything looks grainy this way. Maybe that hurt me, I don’t know- but I won’t make that mistake again!

    Like

  8. DrugMonkey Says:

    Would you rather the reader spend his/her time trying to understand what the fuck you’re trying to do, or internalizing and considering the implications of what is the obvious, inescapable and significant conclusion made clear by your beautiful gel?
    bayman, BugDoc’s point is apt:
    As long as they can show a clear band and have the appropriate specificity and loading controls, should the investigators spend hundreds or thousands of dollars making new antibodies or doing affinity purification for a few experiments if the answer is already clear?…assuming all the appropriate controls have been done and you don’t have to squint sideways to see “the band”, some background on the blots doesn’t bother me. After all, we are trying to move science forward, not trying to win a beauty contest, right?
    Again, I’m going to extend the assumption with respect to drdrA’s figure that BugDoc’s analysis is apt. In no small part because I’m reasonably well acquainted with someone who has dissected these things for me in person in the past. And I have a very good reason to think that there are those that in all honesty, pride and the indignation that you express believe that pretty figures are a GoodThing.
    The point is to decide for yourself when you are objecting to the quality of a figure for substantive reasons and when it is for the sort of “tidy job” reasons that may have very little to do with the real interpretation of outcome. Beyond snobbery, I mean.
    getting back to your beautiful gel comment bayman, you tread on those matters to which I find a more serious objection scientifically. Namely the tendency I find in the approach expressed by you to assume that just because you created a perfect result once (by chance, who knows) with all of your refinements and seeking the “perfect” antibody and all that crap, this is the best figure. To my mind, the figure is supposed to be representative. and not in the wink-wink type of “most representative slide/gel/mouse” way either. really representative of what should occur when anyone else replicates your work. and the methods should best describe the process too. to present the best figure you managed to create ever without describing the reliability of generating that figure (and specifying how it was generated in a genuinely replicable way) is fraudulent, corrosive to science and all kinds of bad shit. IMHO.
    yeah, I know this passes for subfield practice but I DON’T. UNDERSTAND. THIS. People bitching about how they can’t replicate some published work does not enhance my understanding. Nor are corrigenda in which the replacement for the “accidentally included ‘placeholder’ band” are several orders of magnitude crappier appearing (even to my untrained eye) confidence inspiring. dubious unlabeled error bars and the failure to use statistical analysis is foreign to me. It all adds up to a culture in which the idea seems to be “if it can be shown once it must be true” as opposed to asking “how likely is this to be true within an assumption of variable outcome”.

    Like

  9. Bayman Says:

    It all adds up to a culture in which the idea seems to be “if it can be shown once it must be true” as opposed to asking “how likely is this to be true within an assumption of variable outcome”.
    I totally agree with you there DM. I’m not talking the kind of pretty results you get by accident, but the kind you have control over. So, if one has a real, reproducible, but messy result, one should be able to reproduce the experiment while further optimizing conditions in order to get the clearest looking representation of the real, reproducible result.
    To submit a shitty looking piece of data would suggest, in fact, that the result could not be consistently reproduced enough times to optimize conditions to the point where the data looked snappy. So “clean”-looking (NON-photoshopped!!) data can demonstrate 1) experimental reproducibility and 2) mastery of the experimental technique being used. Expecting figures to look clear and clean, within the possible limits of the technique employed, is a GoodThing indeed and actually helps combat the “if it can be shown once it must be true” culture.

    Like

  10. Becca Says:

    It sounds to me, in drdrA’s case, a reviewer was applying perfectly reasonable standards for “prettiness” for a simple protein blot that were *not* reasonable for the gel in question. I have seen that happen before; people just forget how difficult some procedures are, particularly when they produce data that look simlar to easier procedures. It’s usually a symptom of reviewers that haven’t attempted the experiment in question in recent memory (although ocassionally it is also a sign of a reviewer who has a *really outstanding* pair of hands in their lab).
    I will say this about error bars- I personally think they are more intuitive than P values. I can generally look at error bars or the raw data and instantly know whether it would meet a particular cutoff for a P value. I am WIN not FAIL that way. There are times when I look at error bars and wince, but I don’t think papers should be rejected because of it- as long as you can trust the conclusion.
    It would be really swell if people got their data looking pretty, just like it would be really swell if all of the diagrams in signal transduction papers actually put things in pretty pictures and not needlessly complicated ones. I view it as akin to akward use of scientific english (wordy sentences, for example). In a world with infinite resources, these things would be fixed when necessary to facilitate ease of reading. But in the real world, it should not make or break a paper.

    Like

  11. DrugMonkey Says:

    I will say this about error bars- I personally think they are more intuitive than P values. I can generally look at error bars or the raw data and instantly know whether it would meet a particular cutoff for a P value.
    People who think this are full of crap. As I said, it is generally true that there is a correlation between how the error bars appear and whether one is likely to go over the P less than 0.05 threshold in the appropriate analysis. And as long as one holds all things more or less constant and identical to the data, designs and analyses with which one is most familiar, one can limp along and not blow it too often. Your assertion that you can “instantly know” what the p value is going to be suggests you are either unbelievably broadly versed in conducting statistical analysis of the entire breadth of biomedical science, limiting yourself to a very narrow set of data with which you are very familiar or talking utter schmack.
    The point is that big changes in the sample sizes under discussion can throw one’s seat of the pants all off. Big changes in the typical variance, can throw one off. Different analysis strategies can throw one off. The difference between a repeated measures and a between-groups design is the simplest to diagram and explain but there are other more subtle analysis situations that can throw you off as well.
    There are times when I look at error bars and wince
    Why? Because you want it to “look better”? Or because it would make a fundamental difference in what you were able to conclude from the paper?

    Like

  12. DSKS Says:

    Becca,
    “I will say this about error bars- I personally think they are more intuitive than P values. I can generally look at error bars or the raw data and instantly know whether it would meet a particular cutoff for a P value.”
    Really? Statistical differences are fairly easy to eyeball by comparing s.ds, but I find that unless they’re obviously overlapping, s.e.m. bars can be misleading. Especially when more than a couple of groups are being compared.
    A P value, on the other hand, is what it is. Although the myth of “extremely significant” and “almost significant” is a too frequently encountered canard in modern science.

    Like

  13. juniorprof Says:

    Becca, I hate to pile on, but, the sooner you break yourself of this delusion the better off you will be. You are wrong. Not only are you wrong, but if you have any interest in a career in clinical related work you are eventually going to find that human data is usually reported in SD error bars and not SEMs. Try eyeballing SD bars and you are really setting yourself up for misery (especially if you don’t bother reading the statistics portion of the methods to find out why the error bars are so big).

    Like

  14. Bayman Says:

    You can’t simply glance at an error bar or a p-value and conclude anything. You have to know what calculation was used to derive that interval or p value. SD? SEM? 95% CI? Two-tailed student’s t-test?, etc. etc…That said, if I know what they’re saying I prefer to look at error bars too.

    Like

  15. DSKS Says:

    OF course, even s.ds can be misleading if the distribution of the data hasn’t been reported.

    Like

  16. DrugMonkey Says:

    Although the myth of “extremely significant” and “almost significant” is a too frequently encountered canard in modern science.
    oh man, you really want to get me ranting don’t you?
    DSKS is, I think, referring to the situation in which obtaining a p-value of less than 0.05, 0.01 or 0.00000000000001 is supposed to be telling us something about the degree to which a result is statistically reliable (not ‘significant’, please!).
    My view is that your acceptable threshold for error should be a fixed value, essentially for you life time of work. It is an essentially arbitrary standard. Whatever your lowest standard is, that’s your standard. And that’s going to be 0.05 for the vast majority of scientists.
    Presenting p-values of vanishingly small magnitude as if this says something about the quality of your result is intellectually dishonest. You are saying in effect “Yeah but if I did happen to have a more stringent standard, this result would have met it”. Trouble is, we already know that you in fact do not have a more stringent standard for accepting a result as real if you ever publish with a less-stringent standard.

    Like

  17. NeuroStudent Says:

    I am guilty of liking pretty data…but in electrophysiology if the data isn’t “pretty” (I’m talking about example traces here) then there’s a good chance that it’s the result of an unhealthy slice or some other problem that can also effect the results (placing the electrodes in layers other than those reported when recording field potentials, for instance)…pretty much if the traces are horrible then I’m not likely to trust the conclusions drawn from the data
    one problem that I do hear from other students is that they’ll collect a lot of data and then have to run another experiment for “figure-quality” data…this pisses me off–your “figure-quality” data should be representative of your entire set of data, not that one experiment when you actually gave a crap if your slices were healthy or not.

    Like

  18. penny Says:

    nothing wrong with the non-pretty data as long as it’s honest. one advantage pretty data has is that it is intuitive, and one does not have to torture the data to make a statement.

    Like

  19. steppen wolf Says:

    nothing wrong with the non-pretty data as long as it’s honest. one advantage pretty data has is that it is intuitive, and one does not have to torture the data to make a statement.

    Unless, of course, your pretty data has been obtained by 1) picking that one so-called representative experiment that looked good or by 2) eliminating outliers from your sample. Both of which are dishonest.
    Sometimes I look at my data, look at published data, and get this feeling that people are strongly encouraged to make things look better than they really are.
    The main point should not be how pretty things are, but how reproducible. And if the reviewers have doubt, they should ask you to repeat the experiment and compare the results, not to make things look “pretty”.

    Like

  20. Neuro-conservative Says:

    My view is that your acceptable threshold for error should be a fixed value, essentially for you life time of work…Presenting p-values of vanishingly small magnitude as if this says something about the quality of your result is intellectually dishonest.
    DM — Look, I am a huge fan of Jacob Cohen, but your statement is very narrow-minded. It may be applicable in your subfield, but it completely ignores contemporary approaches to working with large datasets.

    Like

  21. penny Says:

    hey steppen wolf, i said “honest,” not “dishonest.”

    Like

  22. bill Says:

    I’ll not need to go into a rant as to why a N=1 like a gel constitutes “data”
    Heh, too true. What you usually see is a statement somewhere that “the expt was repeated n>1 times and a representative gel is shown”.
    I think that in these days of virtually unlimited cyberspace real estate, that kind of statement should be a red flag unless the *other* n-1 gels are available in supplementary data. If you got ’em, why wouldn’t you show ’em? It wastes, if waste it can be called, nothing but a few pixels.

    Like

  23. penny Says:

    I still don’t get the paranoia about picking a representative gel. Ok, all of my gels look the same. If you really want all the pictures, why even bother with the quantitative analysis? I’d be happy to slap all the picture in the supplement, but that is not science.

    Like


  24. I’m with DrugMonkey on most of this. What I resent is that the presentation of the “representative trace” as though it really were representative (and we ALL know that most “example” images are the prettiest ones anyone could find, sometimes having been redone just to achieve a pretty trace/gel/other) leads incoming students to believe that all data look like that. As an early grad student, I think I was much harsher on my data than I should have been, because it didn’t look like most of what I saw in papers. That really discouraged me.
    I’m all for data being presented in a clear, legible, aesthetically pleasing format. For my last paper I had to run a gel a bazillion times because the PCR bands didn’t come out as bright for some lanes as for others, and it was a big enough problem that overall image brightness/contrast adjustment couldn’t fix it. Ok, it’s fine to redo those a bunch, since of course I want people to be able to SEE the friggin fainter band. But people do go overboard in their expectations.
    Becca–I’ll defend your point of view. If all the data are presented as SEMs, for example (and in a lot of papers in my field they are), you can get a good feel for the data by looking at the graph. Folks–I don’t imagine that Becca is really claiming ESP for p values based on graphical appearance, just saying that once you’ve made enough bar graphs yourself, you have a sense for what’s sig and what’s not. Presumably Becca is also accustomed to all-SEM papers or something like that. Becca, just keep in mind that SDs or median/quartiles do show up, and your visual ID of those won’t be appropriate. But you probably know that.

    Like

  25. whimple Says:

    Dr. J&Mrs.H: You sure that consistently fainter band is real?

    Like

  26. JSinger Says:

    Unless, of course, your pretty data has been obtained by 1) picking that one so-called representative experiment that looked good…Both of which are dishonest.
    I’m assuming that you can reproducibly repeat the result shown in that gel, in which case I’d argue that it is indeed visualization similar to a graph. If not, there’s no less selection bias in doing multiple experiments and publishing the first noise that looks good than in doing the same for one experiment, and thus presenting a single gel is inappropriate regardless of whether it was your first or third.
    Also, since this seems to be less obvious than I’d thought: I’m making no judgment about the quality of drdrA’s particular figures, which I’ve never seen and know nothing about. I didn’t mean to suggest that she doesn’t know how to mix Tris, just that some authors clearly don’t.

    Like

  27. juniorprof Says:

    Anyone interested in the pitfalls of eyeballing error bars should check out the following citations:
    1) Error bars in experimental biology
    http://www.jcb.org/cgi/content/full/177/1/7
    2) Inference by Eye: Confidence Intervals and How to Read Pictures of Data
    http://psycnet.apa.org/index.cfm?fa=search.displayRecord&uid=2005-01817-003
    3)Researchers Misunderstand Confidence Intervals and Standard Error Bars
    http://psycnet.apa.org/index.cfm?fa=search.displayRecord&uid=2005-16136-002
    Happy Reading!

    Like


  28. Dr. J&Mrs.H: You sure that consistently fainter band is real?
    Was that a joke?
    But yes, I am. As PCR mavens know, smaller fragments involve (duh) less DNA, and so they pick up less EtBr, and so they look fainter than bigger fragments, all other things being equal. That’s the sort of gel that is frustrating to show in a paper, if it’s important to have the samples side by side, because you end up having to load more of the small-frag sample than of the big-frag, which of course feels vaguely dishonest–but it’s just for visual clarity. The band is real.

    Like

  29. steppen wolf Says:

    hey steppen wolf, i said “honest,” not “dishonest.”

    I know. Do read my comment again, and notice that “unless” in the beginning. By the way, I hate to clarify this, but that paragraph was sarcastic.

    there’s no less selection bias in doing multiple experiments and publishing the first noise that looks good than in doing the same for one experiment, and thus presenting a single gel is inappropriate regardless of whether it was your first or third.

    Given that my comment above (#19) was sarcastic, I obviously agree with you. If you need to “pick” the one perfect gel, this means that 1) either all the other ones were complete crap, even if they showed the same result or 2) this one you picked is some kind of non-representative gel. Which one do you think is usually the case? If a gel is complete crap, most people would ignore the result, rather than trying to read into it, and would just run it again until it is decent (not perfect, but decent).

    What I resent is that the presentation of the “representative trace” as though it really were representative […] leads incoming students to believe that all data look like that.

    That is just what I was pointing out. These images that look perfect NEVER look like the real data (data, not datum) taken in their entirety! Which smells to me like some sophisticated form of cheating.

    Like

  30. whimple Says:

    That’s the sort of gel that is frustrating to show in a paper, if it’s important to have the samples side by side, because you end up having to load more of the small-frag sample than of the big-frag, which of course feels vaguely dishonest–but it’s just for visual clarity. The band is real.
    It was a joke, but…
    I agree with your feeling — adding in more of the small-fragment sample to make it be the same intensity as the large fragment is dishonest. I think you’re trying too hard… oh wait, you’re not doing the new-way of just showing the tiny slice of gel immediately surrounding the band, rather than showing the whole gel, are you? If you got any other non-specific bands out of your PCR at all, then that’s dishonest too. I’ve been screaming about single-shown-band western blots for years in this manner. Fortunately, some high-profile journals are now insisting that the entire gel, warts and all, be shown in supplementary data.

    Like

  31. PhysioProf Says:

    Presenting p-values of vanishingly small magnitude as if this says something about the quality of your result is intellectually dishonest. You are saying in effect “Yeah but if I did happen to have a more stringent standard, this result would have met it”. Trouble is, we already know that you in fact do not have a more stringent standard for accepting a result as real if you ever publish with a less-stringent standard.

    This is a crock. Reporting actual p values tells the reader something important about the outcome of the statistical comparison: the likelihood of having made a Type I error.

    Like

  32. CC Says:

    I think you’re trying too hard… oh wait, you’re not doing the new-way of just showing the tiny slice of gel immediately surrounding the band, rather than showing the whole gel, are you?
    I think she’s having trouble finding an exposure that makes the small band visible without washing out the large band. I don’t think there’s anything sleazy about loading equal quantities of DNA as long as you note that in the figure legend, although it’s awkward if you don’t do that throughout the paper.

    Like

  33. DrugMonkey Says:

    Reporting actual p values tells the reader something important about the outcome of the statistical comparison: the likelihood of having made a Type I error.
    bzzt, wrong. start here
    http://ftp.isds.duke.edu/WorkingPapers/03-26.pdf
    Neuro-con, game on. s/he get’s right into the hot spot if I take where this is going correctly. by all means expand the argument, I’m listening… narrow minded or not!

    Like

  34. Becca Says:

    -start excessive drama-
    I’m so misunderstood!!!1!
    -end excessive drama-
    I will try to be more precise.
    I can tell, within about 20 seconds of looking at a bar graph that uses error bars, provided I know what they represent, whether *I* find a particular two-way comparison to be *convincingly* signficant (I can do it for 95% CI, SD, or SEM, but people are quite correct in pointing out it does matter which it is, and if the figure legend doesn’t say, I obviously have to go look it up, which can definitely take longer than 20 sec). Generally, the P-value is more precise than I need, and those little stars are totally superfluous.
    Also, sometimes perfectly valid P values just aren’t convincing. For one thing, if you are doing 20 individual comparisons, a * for 0.05 doesn’t strike me as maximally useful.
    Anyway, I was more dissing reliance solely on P values than saying unlabeled error bars impart instantaneous understanding of how important all types of results are.
    In any event, I wasn’t actually thinking of SD or SEM at all. CI is related to the P value in a perfectly straightforward way- I’m not claiming some magical statistics ESP in being able to tell P from CI. Although I do think you can *typically* (not always) get a good idea of whether the result is significant from an SD or SEM… particularly if the data were produced from a method you have ever done before, or even just an assay you see in 5%+ of the papers you read. I do not claim to be able to interpert every type of data like that, of course.
    And error bars make me wince not when they are big (one kind of not-pretty) but when they overlap *and* the authors claim there is a difference *and* they do not phrase the conclusions as applying to a variable population.
    Actually, in animal studies, big error bars that do not overlap (particularly when they are the SD and the conclusions are entirely convincing) actually make me rather happy. It shows better than the little “we followed our IRB’s rules” that the researchers are following the animal research guidelines.
    Also, as long as I’m complaining, mol epi SNP studies that attempt to interpert relative risks where the CI spans 1 irk me more than big error bars.

    Like

  35. Bayman Says:

    People, people. I think two issues are being confused as one here:
    1) The accuracy/reproducibility of one’s data.
    2) The clarity with which it is presented ie appearance etc (more generally, how it is communicated!).
    Neither is sufficient, both are necessary if your aim is to be correct, understood and published.
    So, whether a Western is snappy-looking and therefore easy to comprehend, for example, has nothing at all to do with whether it is accurate, or representative or statistically significant or whatever!! Nonetheless, it is still desirable if you want people to pay attention to your figures!
    Second, it is IMPOSSIBLE to determine how “representative” or stat. sig. a Western is based on a figure in a presentation or a paper. So stop trying!! You either believe it or you don’t. Unless the differences jump off the page, take a Western with a grain of salt and don’t believe the implications unless it is supported by alternative experimental approaches. Likewise, if the alternative supporting data are there, don’t reject a paper on the basis of a single dubious Western/gel!!!
    Sorry to the people who think they are elegantly dissecting the cell through one subtle Western blot after another, but unless you’re comparing differences that jump out and kick me in the ass, one Western doesn’t mean shit to me.

    Like


Leave a comment