February 22, 2008

The Evolution of Protein Folds

ResearchBlogging.orgNow online for next week's edition of PNAS is a commentary by Alan R. Davidson (1) about a paper in this week's edition of PNAS out of Matthew Cordes' group (2). Both are worth reading because they speak to a very interesting question: where do new protein folds come from?

The Roessler et al. paper doesn't address this question directly. Their initial intention was to identify the relationship between two distantly homologous proteins: P22 Cro and λ Cro. Though they both belong to the Cro repressor superfamily, these two proteins have just 25% sequence identity and significant dissimilarities in structure, as you can see on the right in the figure I have shamelessly stolen from the paper. P22 is an all α-helical structure that appears to be exclusively monomeric, while λ is an α/β structure that forms a dimer with nanomolar affinity. In an effort to bridge the structural gap between these proteins, Roessler et al. looked at group of proteins related by transitive homology. The idea, as they put it, is that
In this approach, two dissimilar sequences, A and C, are indirectly linked if a third "intermediate" sequence B exists with sufficient similarity to both A and C to imply homology with both proteins. The relationships between A and B and between B and C combine to support distant common ancestry between A and C.
Thus, they identify 3 intermediates, each with about 40% identity to its nearest neighbors, that bridge the sequence gap. Then they ask whether these sequence intermediates are also structural intermediates.

The answer is "yes", and in a somewhat surprising way. It is not the case that each step along the transitive pathway slightly increases β-strand content. Rather, while Xfaso 1 has an all-α structure very similar to P22 and is a monomer, Pfl 6—which is 40% identical (!)—has an α/β structure similar to that of λ Cro and dimerizes with ~1 mM affinity. This dissimilarity allows the authors to present some interesting ideas about the evolution of the Cro family which are summarized in their Figure 4. But what seizes Davidson's imagination is the conjunction of fairly high sequence similarity with structural dissimilarity. What makes this conjunction even more impressive is that the sequence identity is evenly distributed while the structural differences are not. The N-termini of these proteins contain structurally similar helix-turn-helix motifs, so they primarily differ in the structures of the C-termini. Yet amino-acid identity holds up across essentially the whole sequence.

Why is this such a surprise? Well, there are a variety of reasons, which Davidson outlines pretty well. It boils down to this—for a given sequence, it is generally possible to mutate a significant percentage of the residues without disrupting the fold. That is, the sequence overdetermines the structure. Consequently, proteins that have homologous or significantly identical sequences (and 40% identity would probably fall in this range) are expected to possess very similar structures. This poses a problem for protein evolution because it is expected that the initial pool of folds was rather small. If protein folds are highly resistant to disruption or alteration by mutations, it's difficult to imagine how the present enormous diversity of folds arose.

This impression is actually somewhat mistaken. It's typical to perform X→Ala mutations in these studies, and while this can occasionally produce significant cavities in a structure, it probably significantly underestimates the potential effects of a mutation at any given spot. For buried residues, size increases and the introduction of unbalanced charges (X→Trp, Asp, Lys, etc.) are mutations likely to drive the formation of new structure. For solvent-exposed residues, the introduction of bulky nonpolar side chains (X→Phe, Leu, Ile, etc.) would also be more likely to result in a novel fold than the typical approach. I have a feeling that these kinds of mutations are significant in this context, but I cannot check this because both 3bd1 and 2pij are still on hold and cannot yet be retrieved from the PDB. I may elaborate on this point when the coordinates are released to the public. For the time being I should point out that though the quantity of identity is similar between the two termini is similar, the quality is not: identical residues in the N-terminus almost all appear together, while in the C-terminus they are spread out. However, it is worth noting that what groupings of identical residues can be found in the C-terminal region tend to occur within the structural features that changed.

Davidson bears out this point when he refers to some experiments that have shown that a few mutations in key spots could change a protein's fold significantly. He seems to be unaware, however, of natural instances in which highly similar sequences produce dissimilar folds. As I have mentioned before, the upper limit on sequence identity producing dissimilar folds is known. Structural studies that Brian Volkman's group published in 2002 (3) demonstrated that the maximum sequence identity that allows for the adoption of a completely different structure is 100%. That is, given reasonable changes in solution conditions a single peptide sequence can produce two entirely different folds.

The last time I blogged on lymphotactin I discussed the implications of Brian's findings for the protein folding and protein structure prediction crowds. In the context of sequence similarity, however, the lymphotactin story also has implications for evolution as well. To a certain extent it suggests that we have been somewhat blinded by Anfinsen's dogma, in particular the assumption of the unchallenged minimum. The lymphotactin result indicates that context can be extremely important—a sequence that stably folds into one structure in one set of conditions will not necessarily maintain that structure under different conditions. In an elementary sense, we know this already, since we are aware that high temperatures and high concentrations of cosolutes such as guanidine and urea tend to unfold proteins. The important idea is that conversions between Anfinsen-like folds (i.e. folds that conditionally dominate the energy landscape) can occur within the range of conditions that can be achieved physiologically. Because the relationship between sequence and structure is not truly one-to-one, fold diversity may be much easier to achieve than we have suspected on the basis of existing structural studies.

In the end, the results of Roessler et al. provide a powerful counterpoint to conventional expectations about the relationship between sequence identity and structural homology. It appears that Cro proteins group into two kinds of structures, underscoring the well-known stability of protein folds to mutation. However, the structural discontinuity, not hinted at by sequence comparison alone, reinforces the point that fold diversity may be significantly easier to achieve than alanine-scanning mutagenesis experiments have led us to believe. Davidson nonetheless still has a significant point. In this case, it appears that transit to the new structure was relatively short, and Cro is functionally a dimer in both forms—Xfaso 1 Cro forms a dimer in the crystal, and all Cro repressors are expected to dimerize on DNA. But what happens in the transit to a completely novel structure? Can function be maintained as the protein navigates the molten-globule strewn sequence space between stable folds, and if so, how? These are questions that we will have to answer as we develop a greater understanding of the evolutionary history of biomolecules and de novo protein design.

1. Davidson, A.R. (2008). A folding space odyssey. Proceedings of the National Academy of Sciences, 105(8), 2759-2760. DOI: 10.1073/pnas.0800030105
2. Roessler, C.G., Hall, B.M., Anderson, W.J., Ingram, W.M., Roberts, S.A., Montfort, W.R., Cordes, M.H. (2008). Transitive homology-guided structural studies lead to discovery of Cro proteins with 40% sequence identity but different folds. Proceedings of the National Academy of Sciences, 105(7), 2343-2348. DOI: 10.1073/pnas.0711589105
3. Kuloglu, E.S. (2002). Structural Rearrangement of Human Lymphotactin, a C Chemokine, under Physiological Solution Conditions. Journal of Biological Chemistry, 277(20), 17863-17870. DOI: 10.1074/jbc.M200402200 OPEN ACCESS

No comments: