News this week from the world of science: impact factor doesn't really match up with research quality. A study by Brown and Ramaswamy of the University of Iowa in Acta Crystallographica D analyzed crystallographic structures that had been deposited in the PDB and came up with a generalized metric to describe several features of a structure's quality. They normalized it to the average quality of the PDB, and then set about to see what extraneous factors were the best predictors of quality.
There were several interesting results here. The first was that the quality of crystal structures has not generally improved over time, despite advances in technique and equipment. The authors attribute this to the increasing use of crystallography as merely part of an experiment, so that the focus is less on the quality of the structure as a whole and more on making the structure good enough to answer a specific question. One can also imagine that the democratization of crystallography has contributed to this effect. Early structures solved with more primitive techniques only led to a solution if the researcher was qualified and careful. As the tools have become better, more widely available and black-boxed, more people can do crystallography, but these tools have encouraged less-qualified individuals to model structures, and perhaps also encouraged sloppiness among the better-trained researchers. There is also the vacuum-cleaner effect to consider. As the tools for modeling structures have become more robust, individual crystallographers are expected to produce more structures. As a result, the time spent on each individual structure has declined, perhaps more so than the improvements in technology have justified.
Another important result is that structural genomics consortia generally produce slightly better structures than the PDB average. This is certainly reassuring to the government entities that have dumped millions of dollars into these initiatives, but it also bears thinking about why it is so. Certainly one reason for the improved quality is formalized screening and the highly trained people that are doing the structural genomics work. Moreover, these efforts are really focused on getting a good structure, without any direct emphasis on experimental utility. It stands to reason that if your goal is to get a good structure, then it is more likely that those structures you personally consider a success and therefore deposit in the PDB will be good structures. This assumes a certain competence on the part of the researchers, but this is a property that structural genomics researchers manifestly possess. An additional consideration, however, is that these consortia are explicitly oriented towards solving the structures of well-behaved proteins. The high degree of automation used by the consortia is not generally compatible with poorly-behaved proteins or any need to squeeze a model out of troublesome data. A willingness to leave troubled proteins alone is probably part of the success of the consortia in this regard.
The most troubling result from the paper came when structures were analyzed based on the journal of primary reference. Ramaswamy and Brown discovered that more crystal structures initially reported in high-impact journals like Nature, Cell, and Science had below-average quality than structures reported in lower-impact journals like Biochemistry, Proteins, and Eur. J. Biochem. Because the metrics used by Brown and Ramaswamy are not intrinsically sensitive to novelty, this cannot be blamed on the simple "newness" of structures reported to these high-impact journals. The authors of this study attribute their problematic finding primarily to the fact that the structures reported in these journals tend to be (as above) part of a paper, not the whole thing. Moreover, the reviewers for these papers may more often be something other than expert crystallographers, and even when they are, the extreme space constraints may prevent reporting in the paper of the relevant structure factors and raw data.

That the reviewers are important might also be supported by the data. Consider the figure above (my own creation from Table 5 of the paper). This graphs the aggregate quality score for a journal against the number of structures therein - remember that a lower aggregate score is better. There's clearly no linear correlation between quality (lower numbers are better) and the number of structures reported. But it is interesting to see that with two exceptions the 'bad' journals (positive quality score) have a number of structures between 100 and 1000. Inside this range the quality varies significantly but outside of it the quality is almost always good. Maybe this means nothing, but it also may mean that journals who have a significant stock in trade in protein crystal structures also have experienced editors and reviewers who know how to properly vet them. It's interesting to note that the only journal containing more than a thousand structures that has a positive score is Proceedings of the National Academy of Sciences, a journal which has a strange and inconsistent review policy. Similarly, journals where crystal structures are a rare event may have editors who react to them with caution and seek out expert reviewers. By contrast, in the middle range, no trend is discernible except a roughly inverse relationship between impact factor and structure quality.
The more disturbing implication, and one that the authors do not deeply address, is that an external property of novel structures causes them to be published with lower quality: namely, their very novelty makes quality more of an afterthought. This is not just the idea of rushing papers to publication causing trouble. Rather, I mean to say that the editors and reviewers of these papers are flat-out willing to accept lower quality of data in exchange for novelty and impact. In, for example, the recent high-profile pentaretraction by Geoffrey Chang, the reported features of the structures alone should have raised serious questions about publishing them, even without considering that they were contradicted by biochemical data. That these errors somehow did not rise to the level of alerting Chang to find the elementary error in his own software indicates that the reviewers and editors were as sloppy as Chang himself. And why? Because these structures were novel and potentially revolutionary. Good copy outweighed bad modeling.
Crystal structures have a useful feature in that their quality can to some degree be assessed quantitatively, without needing to ask subjective questions (say, whether a structure is consistent with some mechanistic model). That is, this kind of large-scale analysis of research quality is possible. The quality of structures in these journals isn't disastrous, but it is cause for concern. And it raises serious questions about other research published in these journals, data for which the quality is less quantifiable and objective. I won't say that Science and Nature are not to be trusted, but in light of this large-scale trend and recent data-falsification woes in other areas, it would be naive in the extreme to approach reports in these journals without a healthy skepticism.
Brown E.N., Ramaswamy S. "Quality of protein crystal structures" Acta. Cryst. D. 63 (2007) p. 941-950

No comments:
Post a Comment