Wednesday, April 29, 2009

The Evolution of Protein Folding: Is a Crisis Brewing for Darwin?

Historically speaking, there is a distinction to bear in mind between puzzles that prove a challenge to a scientific theory and puzzles that turn into a crisis. The Michelson-Morley experiment in the late 19th century proved to be a crisis for classical physics. So did black-body radiation. The former led to Einstein's special theory of relativity. The latter to quantum mechanics. Both involved radically new ways of visualizing space and time that could not be avoided if --in the case of Einstein-- symmetry was to be reached between classical mechanics and Maxwell's electrodynamics, and--in the case of Planck-- sense was going to be made of all the observational data on radiation. On the other hand, up to the end of the 19th century, Newtonian physics had weathered many puzzles that required some refinement of the theory only.

Darwin's theory has also faced its share of puzzles (and continues to). Before the advent of genetics in the early 20th century, for example, natural selection was looking like something far worse than a puzzle for evolution. Then population genetics grew as a field and the work of specialists such as Simpson, Dobzhansky (to name just a few) established firmer grounds for natural selection.

Still, a crisis is what many skeptics of evolution thirst for, and as often happens what you'd like to see can blind you to what is actually there (or not there). Proponents of Intelligent Design think it's the complexity of the bacterial flagellum that cannot be explained in terms of genetic variation and natural selection.

I was struck by a comment made a while back related to proteins. It all started with Francis Beckwith's post at What's Wrong with the World on the incompatibility between Aquinas and Intelligent Design.

WWWTW blog is self-consciously modeled on Chesterton's classic essay collection of the same name (and in fact I have a first edition American, 1910, soon approaching it's 100th birthday and in very good condition). And while it is encouraging that Aquinas and Intelligent Design don't fit--it remains odd to me that the hostility many academics of Catholic, mainline protestant and Orthodox traditions have for evolution is subtler but not fundamentally different from that of, well, fundamentalists and the more overt intelligent design proponents. Which is to say: an always negative tendency to attack scientists for what they don't know yet. For all the adherence to Aquinas and his arguments from secondary causes, it seems many can't resist falling into the God of the Gaps reasoning implied by the natural theology of Protestant William Paley. (Whatever happened to checking in with Cardinal Newman?)

For example, apropos of a quip by Lydia McGrew dismissing the use of computer models for evolution ("Just amazing what you can do when "seeing" computer programs "evolve" rather than dealing with actual biological entities. If that counts as "scientists have shown" I have several bridges to sell them."), fellow What's Wrong With the World blogger (and, I'm green with envy to say, instrument-rated private pilot) Zippy followed up:
This is an important point. The computer models that computational biologists use bear (or at least bore, a few years back when I was studying this at the graduate level, and still bear every time I do the due diligence) very little resemblance to what is actually going on in physical reality. I've mentioned this before, but here it is again: as far as we know random polypeptide chains of any significant length don't fold into stable native states under physiological conditions at all, let alone fold into nontoxic stable native states, let alone fold into stable native states which perform a useful function which can provide fodder for natural selection, let alone do all that and result in wholly new kinds of proteins, cell types, tissues, organs, or species. And all-atom computer models of hundred-residue chains don't even exist: they are well beyond the compute power available to present day researchers. Computerized protein structure predictions are based on lookup-table statistical analysis of homologes (I know, I had to do some in order to pass a bioinformatics course), not on any kind of at all even remotely workable model of what is actually taking place at the molecular level.

The victory party is still very, very, very premature; but if the neo-Darwinists don't keep holding it, someone might get the idea that they've been doing nothing but blowing smoke for a century or two for reasons that don't have much to do with a dispassionate search for the truth. And we can't have that.

By this reasoning, evolution is apparently worse than an empty suit, prematurely being celebrated by scientists doing nothing. The assertion here seems to be that no actual progress is being made on what amounts to a major problem for evolutionary biology.

Is a crisis in the offing? As we'll see, the answer is no. But it is a challenge, and a fascinating one that, to this layman's eye, looks bound to lead to more fruitful discoveries.

So, let's start with the computer models. Mark Pallen, professor of Microbial Genetics at University of Birmingham, and author of the Rough Guide to Evolution, tells me, "Computer models are obviously simpler than reality and one could not establish from first principles by computer modeling the evolutionary pathways that led to the first proteins, nor model every possible structure in sequence space."

"But," he adds, "this is a bit like saying you can never understand the architecture of a church without an atomic resolution model of all the materials and components that make it up. Or that because we cannot model every atom in the atmosphere, we have no understanding of the weather and cannot make useful weather forecasts. While we may not be able to predict the folded structure of a protein from its sequence, let alone of every 100 amino acid protein in protein sequence space, that does not mean we cannot perform experiments or make observations that inform our understanding of early protein evolution."

According to Nick Matzke, a researcher at the Huelsenbeck Lab, Center for Evolutionary Genomics at U.C. Berkeley, "the processes that we think produce new genes/proteins etc. are not equivalent to random-assembly-all-at-once-from-scratch... We have duplication, modification, selection, rearrangement, etc. "

"Even the very first polypeptides were pretty certainly not assembled all-at-once-from-scratch from a pool of 20+ kinds of amino acids in even proportions, in D- and L-form, as creationists and various beknighted physicists blithely assume. Probably the first time a proto-tRNA grabbed an amino acid and made a short chain, the chain was composed of glycine and few common hydrophobic amino acids and was quite short. Cavalier-Smith (2001) suggests that the original function may have just been a hydrophobic tail for association with a membrane. All of the improbability statistics are irrelevant in this sort of scenario, chirality isn't an issue, etc. "

This is in line with the current research, for example, of Professor Andrei N. Lupas, director of the Department of Protein Evolution at the Max-Planck-Institute for Developmental Biology in Tübingen.

Accrording to Prof. Lupas, "The problem arises from the fact that random polypeptide chains indeed essentially do not fold (I would estimate the proportion to about 1:1020 for polypeptides in the range between 70 to 120 residues). Clearly abiotic systems cannot produce the starting material for a random exploration of folding space (never mind the problem of passing on the information on anything useful you encountered) and it beggars belief that biotic systems could emerge that produce 99.99999999999999999% trash for an initially barely selectable benefit. "

But this is hardly a reason to toss out the principles of evolutionary biology. According to Prof. Lupas: "The solution obviously is to propose that an initial RNA world used peptides for other purposes, in which folding was not an issue, but that it selected for peptides that could become structured upon encountering an RNA scaffold (there is ample evidence that there is a natural affinity between peptides and nucleic acids and that random peptides have a tendency to bind into the grooves, becoming structured through the exclusion of water). The issue then becomes to explain how a set of (non-folding) peptides could yield (folding) polypeptides under natural selection.

"In my department at the MPI in Tübingen, we explore the hypothesis that folded proteins indeed arose from this preselected pool of peptides, through amplification, fusion and recombination. By being written into one chain, these peptides preselected for the ability to form secondary structures would have found that in many cases they could now exclude water between each other, without the need for an RNA scaffold. Folding would thus be an emergent property resulting from the increased length and complexity of peptides. If this is true, then we think we should be able to reconstruct this vocabulary of peptides in the same way in which ancient languages such as indo-European have been reconstructed through the comparison of modern languages."

Two of Lupas' recent papers are here:
On the evolution of protein folds: are similar motifs in different protein folds the result of convergence, insertion, or relics of an ancient peptide world?
Lupas AN, Ponting CP, Russell RB.J Struct Biol. 2001 May-Jun;134(2-3):191-203.

More than the sum of their parts: on the evolution of proteins from peptides.
Söding J, Lupas AN. Bioessays. 2003 Sep;25(9):837-46.

Professor Lupas also contributed a chapter to Computational Structural Biology, published last September, which is devoted to the evolution of protein folds. Here's a snippet worth quoting at length from the end of the chapter:
Proteins may have originated by the repetition of short peptides, a process that efficiently yields fibrous proteins such as coiled coils and β-helices.39,40 Repetitive sequences appear to have a higher chance of folding and also more favorable structural properties than nonrepetitive sequences.41,42 The problem of passing on the sequence information, however, remains unsolved. Also, domains seen today do not have fibrous elements at their core; there is a discontinuity in fold complexity between fibers and all other folded domains and fibers are structural, not catalytic elements, whereas the primary role of proteins is catalysis.

We favor a scenario for the origin of proteins by fusion and recombination from an ancestral set of peptides, which emerged in the context of RNA-dependent replication and catalysis (the “RNA world”).15 These peptides, originally short chains of abiotic origin, would have been selected as co-factors of ribozymes, broadening their catalytic spectrum and improving their stability and folding efficiency. As the abiotic pool became depleted, ribozyme-based organisms developed an evolutionary incentive to ligate peptides catalytically, and later also to establish a primitive code so as to increase the yield of useful peptides. The need for improved specificity provided the evolutionary pressure for the emergence of peptides capable of assuming secondary structure on an RNA scaffold. The assembly of longer polypeptide chains from these pre-optimized peptides led to folding as an emergent property, when peptides found that they could now exclude water between themselves (“hydrophobic collapse”) in the absence of an RNA scaffold. The dominant role of recurrent supersecondary structures in the architecture of modern folds43 may be the result of this process.

Whatever the mechanism, it appears to have ceased a long time ago, since the basic complement of proteins in living beings has not been enriched by new folds for hundreds of millions of years and has probably been essentially stable since the time of the last common ancestor. Why is that? Did nature find most islands of stability available to the 20 natural alpha-amino acids in one burst around 3.8–3.5 billion years ago? Or is it that, once a set of folded and functional proteins was in place, no new exemplars could emerge across the complexity boundary imposed by the twin constraints of structure and function, without being eliminated immediately by established competitors? The issues resemble the questions surrounding animal bodyplans. These also emerged in a comparatively short time (the “Cambrian explosion”) and only a very limited number became established. Even though new opportunities arose periodically through large-scale extinction events, none led to the emergence of new body-plans; rather, the openings were filled by survivors with the same or similar body plans as the extinct species.
From the other side of the world, Ian Musgrave, professor at the University of Adelaide in Australia writes, "as others have already said, proteins probably didn't arise from random assembly of 100+ amino acids in one go in the first place. " But they didn't need to. He cites, among others, these two papers:

Keefe AD, Szostak JW. Functional proteins from a random-sequence library. Nature. 2001 Apr 5;410(6829):715-8. Link here.

J Mol Evol. 2003 Feb;56(2):162-8.Can an arbitrary sequence evolve towards acquiring a biological function? Hayashi Y, Sakata H, Makino Y, Urabe I, Yomo T. (Musgrave: "The answer is yes.")

Keefe and Szostak are optimistic about their progress:
Our isolation of new functional proteins shows that it should be possible to obtain an unbiased view of the inherent diversity of all possible protein structures, and to determine whether biological proteins represent only a small subset of this diversity. Comparing the sequences of our newly evolved ATP-binding proteins with biological ATP-binding proteins has not revealed any significant similarity; structural data will also be required to reveal whether these proteins, especially the Zn2+ metalloprotein, are similar to those of any biological proteins.

In conclusion, we suggest that functional proteins are sufficiently common in protein sequence space (roughly 1 in 1011 that they may be discovered by entirely stochastic means, such as presumably operated when proteins were first used by living organisms. However, this frequency is still low enough to emphasize the magnitude of the problem faced by those attempting de novo protein design.

According to Musgrave, "a modest fraction [of random polypeptides] (somewhere between 1 in 108 and 1 in 1012) have some sort of selectable function."

These are just a few scientists with whom I raised the question. There are many more making the evolution of protein folding the center of their attention. Far from being a black box embarrassment to evolutionary biology, the evolution of protein folding turns out to be a challenge worthwhile to quite a few specialists.

So where does that leave the assertion of crisis at the state of protein evolution? To me it seems no different than the discredited irreducible complexity arguments of the ID movement. Because protein folding cannot be fully explained now by the principles of evolutionary biology (i.e, descent with modification by the mechanisms of genetic variation and natural selection), the thinking goes, it must therefore call into question the entire theory.

As I mentioned earlier, I understand why this kind of argument is irresistible to fundamentalist evangelicals. But it still surprises me that academics with a clear tradition of appreciation for Aquinas and secondary causes flirt with it.


Mike Flynn said...

Protein folds are platonic forms, no? Augustine says (De Trin. iii, 8): "Of all the things which are generated in a corporeal and visible fashion, certain seeds lie hidden in the corporeal things of this world." The which Aquinas dealt with in Art 2: Whether there are any seminal virtues in corporeal matter?

The real danger lies in reducing Darwin's metaphysics to a mere tautology. When everything is explained, nothing is explained. The mechanisms proposed, however, are not so much Darwinian natural selection [i.e., the Malthusian struggle for resources by the top organism] as the are chemical and physical reactions. (After which, as Popper put it, "survivors survive.")

Scott Carson said...

Great post, John. You may be interested to know that Van Fraassen's recent book discusses scientific modeling in some detail. There is ample philosophical support for the view you're defending here!

John Farrell said...

Thanks, Scott! Interesting Mike should bring up Augustine. Steve Matheson just linked to this piece by McGrath on the very subject.

Lab Rat said...

I've just been revising protein folding! Although we're not looking at Deep Time questions about how the folds deveoped and the shapes evolved, just how they get from nascent chains to the fully folded form normally in vivo. Which doesn't involve a huge amount of computer modelling, just lots of crosslinking studies, protein crystallography, and everyone determindly ignoring quantum physics problems as hard as they can. :)

This was a great post to read though, especially after spending the last few weeks keeping right away from blogs in the mistaken belief that it would halp me to revise.

zippy said...

Glad I gave you fodder for some interesting research, John.

For the moment I'll limit my comment to this: always negative tendency to attack scientists for what they don't know yet.

The issue isn't merely what scientists don't know, nor are scientists the 'target' of the criticism. I'm an irrepressible geek myself and love research, scientific and technological progress, new knowledge and new theories, etc. Science is an endless well of fascination and wonder.

The target of the criticism is the popular mythology that scientists do know. Nobody knows at the level of physical explanation - at all - how prokaryote-world gave rise to tigers and octopi, let alone where prokaryote-world came from in the first place. The popular perception, and what is taught in primary school, is that this is comprehensively explained (with but a few details left to work out) by random genetic mutation and natural selection.

Which is, and ever had been, total bullshit.

It isn't the ignorance which is objectionable, and it isn't the day to day work of scientists which is objectionable, etc. It is the cultural lie which has, for a very long time now, treated biological origins as a scientifically settled question.

Of course my view on these things probably doesn't fit any of the usual clusters well. I do think a lot of people have religious/metaphysical or anti-religious/positivist motives in the Darwin debates, almost all of which are wrongheaded. I agree with the Aristotlean/Thomist view that the debate is over efficient causes and is basically irrelevant to religion -- that the theistically-motivated side of it is, in most cases, especially wrongheaded. And I understand supporters of scientific objectivity wanting to put up a moat to keep out the fundamentalists.

But those are all sociological issues primarily, and none of it puts me in a position to look at BS and call it a diamond. We should just admit that we don't really know how prokaryote-world became tigers and octopi, certainly not in the sense we are perceived to know it, though we do know lots of interesting things and there are lots of interesting avenues of research.

Mike Flynn said...

The reason why there is a tendency to think that the scientists "know" these things is that, given any existing configuration, one can always concoct a plausible-sounding Darwinian story to account for it. This narrative of what =could= have happened gives the illusion of having actually explained what =did= happen.

John Farrell said...

We should just admit that we don't really know how prokaryote-world became tigers and octopi, certainly not in the sense we are perceived to know it, though we do know lots of interesting things and there are lots of interesting avenues of research.Well, yes, we do know lots of interesting things, most of which meet what we expect to find if the Darwinian theory is correct. But that doesn't mean it explains how life originated or finalizes the Darwinian theory for all time.

I really don't know any scientists who claim that we do know everything. Although plenty love to claim their favored mechanism. [Larry Moran, for example, loves to rake over the coals any fellow biologist who thinks natural selection is even slightly more important than variation.] Or who doesn't get the distinction between the mechanisms of evolution and the history of life on earth, which throws in all sorts of other variables that can't possibly be perfectly known.

I appreciate (and agree) that this is more sociological. And we can agree that it's the culture more responsible for telling us what 'we know' when we damn well don't. But my concern is that it's too convenient for groups like YECs and the IDers, obviously unsatisfied and impatient with blaming the culture, to turn their ire on on science.

zippy said...

...most of which meet what we expect to find if the Darwinian theory is correct.

That depends on what you mean by "the Darwinian theory". If what you mean is what taught in high schools - that prokaryote-world became tigers and octopi primarily through the mechanism of random mutation and natural selection - then no, most of what we know is actually contrary to that storyline.

John Farrell said...

It would be a tall order for you or I to say we know what of evolution is taught in high schools, Zippy, public or private. Given the way that text book publishing works, with highly customized books and materials tailored precisely to win adoptions at the state and local level.

That aside, what specifically contradicts the assertion that cephalopods and felines descended from prokaryotes through variation and natural selection?

zippy said...

I know I don't need to tell you this John, but "variation" is non-trivially different from "random mutation". Variation could have any cause at all; it is virtually synonymous with change. If we speak non-specifically enough we can beg almost any question.

John Farrell said...

I know I don't need to tell you this John, but "variation" is non-trivially different from "random mutation"...
Ah. For some biologists, sure. Random mutation being restricted purely to those changes that occur within the life of a single member of a species.

Is that how you're saying the high school textbooks define it? For example, the book that Ken Miller co-authored, which is pretty widely used?

Mike Flynn said...


In statistics, random variation is due to many small causes, none of which is dominant. Sometimes called "common causes" because they are commonly present or common to all units. It cannot be assigned to any one particular cause. Random variation has no net secular trend, but will "regress toward the mean." (E.g., children of tall parents tend to be shorter than the parents.) Collectively, the data will form a statistical distribution: the Bell Curve (if the causes are additive), a Lognormal curve (if they are multiplicative), etc.

Assignable variation is that which can be "assigned to" a particular individual cause, called a "special cause." Special causes result in secular trends in the data: a spike beyond the normal range, a shift in mean value, a trend, a cycle, etc.

The important thing is that "randomness" is not a cause. It is a description of the combined effects of many causes which cannot in practice be separated.

John Farrell said...

Interesting paper, here, Mike, that I thought you would enjoy, by John Hall, who has a PhD in mathematical statistics. Steve Matheson had a link to this on one of his reading groups.

Here's the last paragraph:
"The error that many, including philosophers like Stamos and Rosenberg, make is in drawing their conclusions from the nature of these processes, their stochasticity, and hence their unpredictability. These conclusions reflect only local purposes. An accurate understanding can only be gained by studying the entire set of possible outcomes and the system of which the process is a part. Purposes which will be achieved by the system no matter which outcome occurs, are readily attainable. This does occur in systems of our own construction and can even be seen in mundane activities like sports.Such counter-examples refute the claims of those who are blind to the purposes of chance."