Did you know that the is the most common word in the English language, accounting for 7% of the corpus (as the collective words in the language are known)?
Did you also know that the next most common word, of, occurs in about 3.5% of cases, while and, the third-placed, turns up about 2.8% of the time.
An American linguist by the unlikely name of George Kingsley Zipf (1902-1950), noticing that this pattern continued deep into the corpus, came up with the delightfully named Zipf’s law. It states that in a given body of natural language utterances, the frequency of any word is inversely proportional to its rank in the frequency table.
In other words, the most common word will be about twice as common as the second-placed word, three times as common as the third-placed, and so on.
Zipf’s law doesn’t just hold for English. It holds for all languages, including Esperanto. If you come up with a plausible, universally accepted explanation for this, you’ll likely earn yourself an exalted position in the ranks of etymologists. No one has done it yet.
Zipf gave it a shot. Based on the principle of least effort, which was discovered in 1894 by Italian philosopher Guglielmo Ferrero, Zipf reckoned that humans, wanting the greatest outcome for the least effort, repeat useful behaviours more often than those that are less useful. It follows almost by necessity, then, that we’ll find ways to have the minimum number of words suffice to convey the maximum amount of information for our day-to-day needs.
The principle of least effort, which my two sons perfected as teenagers*, is hugely important for anyone designing a library. It says - and evidence bears this out - that someone looking for information will tend to use the most convenient, least demanding search method. As a result, they’ll stop looking not when they’ve found all the information that might be useful, but as soon as they’ve found what they consider a minimally acceptable amount.
It’s also why you, when seeking an answer to a specialised question, will settle for what the office pub quiz champion tells you rather than go to the trouble of meeting a real expert who happens to work in another building.
This insight led to Mooers’ law, which says that an information retrieval system will tend not to be used whenever it is more painful for a user to have information than to not have it.
So any rose-tinted-glasses ideas you have about hard-working researchers crawling over broken glass to find that final precious nugget of information that’ll take their paper from merely ok to pants-shredding wondrous - well, you can forget it.
Which is a little sad, isn’t it? One of the things I bang on about in my writing training courses is that it’s counter-productive to start writing until you’ve done all the research you know to do. Like, all of it. I suspect that sound as that advice is, it’s also based more on optimism about human behaviour than it is on harsh reality.
In fact, don’t tell anyone I said this, but I’m certain I disregard my own advice frequently. (And given the principle of least effort, I’m confident you won’t tell anyone.)
If you were to illustrate Zipf’s law with a graph, you’d see what is known as a power-law curve. Word frequency drops alarmingly sharply to start with before settling into a long, almost flat tail that hugs the horizontal axis (once you get past the most common words, the language settles into an extremely long list of words, each of which occur only rarely). These kind of relationships show up all the time: sound volume as it moves from its source, the size of forest patches around the globe, the diameter of dust devils, and the size of craters on the moon.
A power-law curve.
Now you may be thinking that Zipf’s explanation is both elegant and compelling. And while you may well be correct about its elegance, many linguists will say you are certainly wrong about its being compelling.
Some time after Zipf floated his explanation, another one was put forward. If you were to create a word generating machine that operated a bit like monkeys at typewriters, a couple of things would happen as a matter of mathematical certainty.
One is that it would produce many more short words than long ones.
The second is that as words got longer, the number of possible arrangements of letters within each would increase exponentially. As a result, there would be many more longer words available to you than short, even though each of those words would occur with less frequency than those that are shorter.
That would give you the same result, near as dammit, as Zipf observed.
So while Zipf was spot on in his observation, he may have been way off in his explanation.
As for me, I’m not taking sides on this issue. Instead, I’m invoking the principle of maximum fence-sittingness, a theory I developed just now and which I propose to develop further just as soon as the principle of least effort wears off.
Bits and specious
* My sons, you’ll be pleased to know, have since grown to be wonderful men who I not only love madly but am extraordinarily proud of. I hope they will keep this in mind when I reach my dotage and need caring for.
Not everybody, it seems, was entirely enamoured of Zipf’s approach to linguistics. In 1965, psychologist George Miller described him as “the kind of man who would take roses apart to count their petals”.
Every so often you encounter a new writer who just knocks you on your arse with the elegance and beauty of their writing. One such for me is New Yorker Lily Seibert, whose article on the long term effects on Covid on her physical and mental wellbeing is both courageous and masterful. You can read it here.
If you see one film only this year, make it the wonderfully silly, happy and joyous Red, White and Brass. It tells the unlikely but true story of a Tongan church group who came up with the preposterous scheme of securing seats at the Tonga-France match at the 2011 Rugby World Cup by putting together a brass band as the pre-match entertainment - even though most of them had never played a musical instrument in their life. It’s now officially among my top ten favourite movies of all time and I can’t recommend it highly enough.
Quote of the week
I like the word “indolence”. It makes my laziness seem classy.
Bernard Williams