One result that attracts considerable attention in our article on Submission Flows is the one presented in Figure 4A, indicating that resubmitted articles were, in a given year in a given journal, significantly more cited a few years (3-5) later. Given the notoriously wild distribution of citation count data, and the thousands of factors that influence the citational success of published articles, I honestly did not expect to find any effect of just submission history. In the article we focused on establishing the statistical significance of this effect in a robust way, but it is not quantified (Figure 4A presents the raw data but one needs to control for publication year and journal). So how different are they?
As is visible in the Figure, most articles in our sample had been cited between zero and 50 times by July 2011 (even though a few articles had as high as 8000 citations). Comparing articles that were resubmitted and those that were first-intent submissions to a journal, it can be seen that resubmissions were less likely to receive no citation at all (about 30% less likely; first red bar) or just one citation (second red bar). In contrast, there were more likely to be cited 3-5 times, 6-10 times, or 11-50 times, which spans most of the range. All these effects combine in producing a global increase of citation count for resubmissions. Interestingly, the few very-cited articles (51 and above) showed no trend or even a reversed trend, but they are much rarer so that the numbers are less reliable. This would suggest that very-highly cited papers obey different rules than “normal” papers. It is indeed likely that the effect of submission histories varies with journal and/or field. In particular, under the hypothesis that the review process improves manuscripts, the effect should be smaller for top journals, since many rejections are then made without any review, so that no effect is expected. This latter point applies globally, by the way: as in our study we could not discriminate resubmissions following review from resubmissions following editorial rejection, the dataset is likely to underestimate the difference between the two classes of submission histories. A stronger difference would be expected for resubmissions following actual review(s).
Another factor that tends to minimize the difference observed in the above Figure, is that resubmissions occurring between journals from different journal communities (as determined from network analysis) were less cited than those between journals of the same community (Figure 4B in the article). To look at this difference we use only journals that are connected in the network and assigned to one of the 7 major clusters; (this excludes the top multidisciplinary journals that are, by definition, not well assigned to a specific cluster), so the dataset is smaller than above. The difference between within- and between-clusters resubmissions is shown in the first left figure below.
As can be seen, citation counts are consistently shifted to higher values for resubmissions within a cluster of journals (field) compared to those between fields. The latter were more likely to receive 0 to 5 citations, but less likely to receive 6-100 citations.
The data are not the same as in the top figure (some journals were excluded) so we cannot directly compare them. For comparison, we can contrast the two types of resubmissions and first-intents, using the same set of journals. This is shown in the right figure above. Clearly, when using only resubmissions within fields (the vast majority of resubmissions) and omitting journals not well-assigned in the network has reinforced the difference between resubmissions and first-intents: looking at red bars, resubmissions were less likely to have <=5 citations and more likely to have >6. Resubmissions were, in particular, about 50% less likely not to be cited at all. In contrast (green bars), resubmissions between fields showed an opposite pattern and were LESS cited than first-intent submissions.A little bit of technique: It is not advisable to try to fit simple parametric models (e.g. ANOVA or mixed-GLM) since, even when log-transformed, citation counts have ugly distributions and homoscedasticity is utopia. One can use an exact permutation procedure to remove the effect of submission history while controlling for year (3 levels), journal (923 levels), and the interaction of the two. The difference between the observed test-statistic and the center of the null (permuted) distribution for this statistic gives a reliable estimate of effect size. In the article we tested for a shift in location with such a permutation procedure, using as test-statistics the difference in mean log-transformed counts (but this still has a very skewed distribution, so that the mean does not tell all) and Wilcoxon’s rank-based statistic (which is more robust to the long tail of the distributions). To visualize the difference, it is better to use the actual density-distribution of citation counts (i.e. the histogram) as a test-statistic, and compare the distributions of the two submission-histories. This is what is shown in the figure.