Discount thoughts: What do we learn from the Protein Ensemble Method?

Anonymous left a comment on my post on Bruschweiller's work, referencing a couple of papers by Amarda Shehu, Cecilia Clementi, and Lydia Kavraki, the cites for which you can find at the bottom of this post. The most fascinating thing about these papers is the remarkable fidelity with which their Protein Ensemble Method (PEM) reproduces NMR-derived order parameters, 3-bond J couplings, and residual dipolar couplings. The authors demonstrate excellent correlations for ubiquitin, eglin c, Fyn SH3, Fnf10, and CI-2, and while all of these are relatively small proteins this is still a major accomplishment. Nonetheless, it is striking how little we learn from the exercise.

Keep in mind that one of the key goals of a structural biology research program is to get a veridical ensemble, i.e. an ensemble of structures that closely resembles those actually sampled by a protein under equilibrium conditions. We can learn important information from other kinds of ensembles, but the one that contains the information we are really after is the veridical ensemble.

The limitation here is intrinsic to the technique, so the technique bears some explanation. PEM utilizes an algorithm derived from robotics to move pieces of the protein. Initially, the approach was designed to map the ensemble of structures available to a loop, and I want to stress that with regards to that task I have no complaints. When given the task of mapping out the range of likely conformations of these regions this seems like an excellent approach, and the second figure of the 2006 paper seems to put this usage on fairly solid footing. The overall idea is that positioning the ends of a loop next to their anchor points is similar to solving a problem for getting a robotic arm with some number of degrees of freedom to adopt a particular pose. The authors' algorithm solves this inverse kinematic problem with a coarse-grained view of the backbone. At this point the backbone is frozen, the side chains are added back and their conformations are sampled randomly. The conformations thus generated are then subjected to energy refinement using a conventional force field. For a loop with no surroundings, this is all well and good.

The problem arises when the whole protein is subjected to the technique. This is done by using a rolling window of residues: the fragment is chosen, an ensemble defined for it while the rest of the protein is held rigid, and then the window moves to the next overlapping fragment. The various structures determined in this phase are all stored; the dynamic properties of a given residue are derived from a weighted average of all snapshots of all fragments that include that residue, with the exception that the first and last few residues of any fragment are out of bounds due to artificial restraints.

The ensemble of structures derived is therefore not veridical. Because of the fragment-replacement approach, only a single part of the protein is ever actually departing from the equilibrium or minimum-energy structure—it is unlikely that motions are actually distributed this way. Moreover, because the endpoints of all snapshots cannot be simultaneously resolved, it is not possible to assemble whole-protein conformational ensembles from the individual fragment ensembles. So, no individual snapshot is likely to reflect a significantly populated member of the ensemble, and also there is no way to collate the snapshots in such a way that the energetics of the real ensemble are accurately sampled. We thus end up with an ensemble of structures that does not reflect the set of structures actually sampled by the protein at equilibrium.

As a result, the structure that is produced can give us only limited information about the protein. For instance, this might be a reasonably reliable way to predict what sorts of deformations are possible or likely in a binding interaction. Also, PEM probably does a good job of reporting at least the lower limit of the range of the structural ensemble. However, because it does not allow for significant compensating deformations outside of the modeled region the conformations obtained probably do not cover the entire solution ensemble even for a particular fragment.

A clear implication of this work is the idea that the data are dominated by local fluctuations. That is, dynamics information derived from NMR relaxation experiments, quantitative J-coupling analysis, and RDCs primarily reflects short-range motions that do not involve major excursions from the overall structure. If this were not the case, it is unlikely that an intrinsically short-range method such as PEM could reproduce the data so well. This is not exactly a surprise, however, and the nature of PEM for the most part prevents us from learning how local motions in one region of the protein affect local motions in a distal region.

In a larger sense, however, this work reinforces the idea that the ideal approach to constructing a veridical ensemble will involve some combination of coarse-grained and all-atom approaches. The key problem here is not the computational method but the windowing. If the inverse kinematics approach used here can be extended to treat the whole protein—or at least multiple regions of the protein—simultaneously, then I think the situation improves. The question is whether this kind of algorithm will be any more efficient than MD if the whole system is in motion; I suspect at least some part of the computational savings (after what comes automatically with the coarse-graining during step one) arises from having rigid context for the fragment motions. However, this approach is also likely to be more amenable to parallelism than standard MD simulations, and because of the coarse-graining it has the ability to sample structures accessible on a timescale longer than MD can treat.

The authors of these studies imply that their future focus will be on extending this approach to larger structures; I would urge them instead to prioritize developing a way to employ PEM or a similar method without relying on fragment replacement.

Shehu, A., Kavraki, L.E., Clementi, C. (2006). On the Characterization of Protein Native State Ensembles. Biophysical Journal, 92(5), 1503-1511. DOI: 10.1529/biophysj.106.094409

Shehu, A., Clementi, C., Kavraki, L.E. (2006). Modeling protein conformational ensembles: From missing loops to equilibrium fluctuations. Proteins: Structure, Function, and Bioinformatics, 65(1), 164-179. DOI: 10.1002/prot.21060

Discount thoughts

Pages

January 25, 2008

What do we learn from the Protein Ensemble Method?

No comments:

NOTICE

RSS Feed

About Me

Also a contributor to:

Motley Crew

Other Sites I Like

License Information

Past Thoughts

Popular Posts