October 10, 2007

I haved a cake, but I eated it

A pair of articles in this week's Nature weaved an interesting picture of the evolution of language. Biologists will find the key discovery of both papers relatively unsurprising: the rate of a word's evolution is inversely related to the frequency of its use. This is precisely analogous to the interpretive principle of evolutionary studies, in that the best-preserved amino acid residues across time and speciation are assumed to be essential to a protein's function. It appears that a similar principle is at work in the evolution of language, the most used (essential) parts of a language also seem to be highly resistant to change.

The more general and esoteric paper is by Pagel, Atkinson, and Meade, studying the divergence of modern languages from the Indo-European root language. This is roughly equivalent to a study of divergent evolution or speciation. They used a Bayesian Markov Chain Monte Carlo algorithm (words that will likely strike fear into the hearts of all current Kern Lab members) to create a model of word evolution that closely matched observations. Between different Indo-European languages and within the languages of Spanish, English, Russian, and Greek they found that frequency of replacement of a given meaning/use was dominated by frequency, with a slight secondary effect from what part of speech it was. Conjunctions and prepositions seem to be most susceptible to change, with numbers least likely to evolve over time.

A more entertainingly-written study by Erez Lieberman et al. examines the regularization of verbs in English over time, from the Old English of Beowulf to the modern era. This is more like studying convergent evolution. Grabbing 177 irregular verbs from the old language, they related the percentage that had regularized to their frequency of use, and discovered that the probability of regularization scaled with the square root of the frequency of usage, as seen in their figure here. Again, frequency and evolution rate are inversely related.


What gives rise to this apparently universal effect? The authors of the studies suggest a few possibilities, the first being that familiarity brings stability. We're less likely to screw up words we use constantly, and probably prone to making up words to replace the ones we forget. Usage frequency will dictate how often a "new" word is needed, and whether the new word will be tolerated. A similar principle may also be at work in verb regularization. Remembering the irregular form of a verb requires a certain amount of effort for each verb, while a generalized rule requires only the effort to remember the single rule for all words. Unless a word is frequently encountered, the neural potential containing its irregular forms will not be reinforced, and the word will slide into regularity.

Moreover, because it is difficult to remember the proper irregular form, the (incorrect) regular form will be easily recognized (and therefore tolerated) in speech. So while my little title, composed of top-tier words, probably struck you as instantly wrong, it's likely you passed over the technically incorrect "weaved" of the first sentence (should have been "wove") without realizing it. In fact, regularized forms (such as "weaved") of many of the lower-frequency irregular verbs are already in wide use.

It will be interesting to see what effect the use of the internet has on this. The web is known for creating words with ease — "leet" and "pwned" are some common examples — and yet these words seem to mostly be transients, disappearing from usage almost as soon as they are adopted. And while the sheer volume of text and rapidity of typing on the internet would seem to serve the error-prone processes that produce new words, the easy availability of references and the ubiquity of pricks willing to point them out would seem to work against the evolution of new language. Now that we are past the alarmist stage where the internet is accused of destroying English, it will be interesting to see where it leads the evolution of the language.

1 comment:

Matt said...

I can has comment?