The Professor and the Madman

Read anything good recently? You don't have to write a review. If it was good, mention it here. If it was rotten, mention it here.

Please include both the name of the book and its author(s) in the title of your posting. Our gratitude for your considerateness will be your reward!
Post Reply

The Professor and the Madman

Post by pokoma » Fri Jan 03, 2014 9:34 pm

I just finished reading The Professor and the Madman by Simon Winchester about the birth of the OED. Interesting that it began with the ground-breaking concept "How does one look something up?" Also interesting is the delicious anachronism "How much faster (instead of a generation of researchers reading and making notes by hand) could the project have been accomplished with today's computers?"

Re: The Professor and the Madman

Post by Erik_Kowal » Sat Jan 04, 2014 12:17 am

The opportunities offered today by computers to the OED's compilers, compared with what was available in the mid-19th century, are not merely a function of their ability to store, analyse and organize large quantities of data, or the increase in the rate at which they can handle it compared with paper-based methods. It is also a question of how much more comprehensive the OED's mission can be.

The connectivity of the OED's computers with the internet makes it possible to collect data from an indefinitely large number of contributors: examples of the possible mechanisms are online crowdsourcing and the linguistic analysis of scanned-in books, journals and newspapers (a phenomenon which is itself only possible because of IT) made available through initiatives like Google Books, Project Gutenberg etc.

Besides the capability that computers offer for pulling in data from a much wider range of written sources, voice-to-text transcription technology also makes it possible (in principle, at least) to mine spoken English for novel locutions and occurrences of a given expression. These sources might range from relatively formal spoken speech (e.g. scripted radio programmes) through semi-formal speech (e.g. interviews and debates) to casual and informal speech (e.g. speech recorded in a bar or a supermarket, or at a sports game -- or, thanks to the NSA's comprehensive surveillance programmes, phone calls [just kidding, almost]).

As these technologies also become more available or commonplace in the lesser-developed countries of the world, and most countries find themselves connected to each other by the internet, the scope for extending the data-gathering activities to encompass the numerous regional varieties of English (such as Australian English -- hi there, WoZ) is also increased or expanded.

All this would, or will, greatly expand the size of the available corpus from which relevant data can be extracted.
Signature: -- Looking up a word? Try OneLook's metadictionary (--> definitions) and reverse dictionary (--> terms based on your definitions)8-- Contribute favourite diary entries, quotations and more here8 -- Find new postings easily with Active Topics8-- Want to research a word? Get essential tips from experienced researcher Ken Greenwald

Re: The Professor and the Madman

Post by Wizard of Oz » Sat Jan 04, 2014 2:15 pm

.. Erik you raise many interesting points but I find it interesting that you talk about the numerous regional varieties of English .. the reason I find this interesting is my great problem with the might is right approach to dictionary entries and definitions .. I am beginning to increasingly believe that there can no longer simply be an OED .. time and again I see the lists of "new " words to be entered in the OED and marvel at how they increasingly are coming from the US regional variety of english .. is that the way english should be defined ?? .. the BIG market approach ?? .. I believe that it is timely for the OED to be clearly defined in terms of where the corpus is derived from .. just as I have a copy of the The Australian Oxford English Dictionary there should be for instance, The US OED and the The British OED .. I believe this now should be the norm as increasingly there is NOT a case for one regional variety of english to be considered the "correct" version .. it is going down the same path as Received Pronunciation where a false standard was applied by a small powerful group ..

.. yes I do realise that there is the small entry before some definitions of Mainly US or Mainly British but that automatically excludes Australians because at times we have the same ideas as the US, at other times the UK and very often we have our own ideas ..

.. quite naturally the Americans see no problem with the current methodologies as they favour their culture over others .. oh well ...........

WoZ proudly Aus
Signature: "The question is," said Alice, "whether you can make words mean so many different things."

Re: The Professor and the Madman

Post by Erik_Kowal » Fri Jan 10, 2014 1:18 pm

According to Wikipedia's article on Australia, that country's estimated population is currently (2014) around 23.3 million, and for 81% of its population English was the only language spoken at home in 2011. According to Wikipedia's article on the English language, the world's total number of native speakers of English is about 360 million, and the number of speakers of English as a second language is about 380 million.

Wikipedia has a table summarizing the statistics for speakers of English located in the world's main English-speaking countries.

From all of this, it is clear that while Australia's share of the world's native speakers of English is small in percentage terms (at about 4%), in absolute terms it is still quite sizeable.

I partly agree with your implied criticism, WoZ, that the current line-up of publications offered by the Oxford University Press (OUP), which produces the OED, has quite a way to go in reflecting the regional dialects and variants of English spoken in those countries that contribute relatively small proportions of native speakers of the language. However, such a criticism is less valid in relation to Australian English, considering the scholarship that has gone into its Australian Oxford Dictionary (see the description below). The OUP's other dictionaries for regional varieties of English include the Canadian Oxford Dictionary, the New Oxford American Dictionary (plus a number of other American English-related publications), the South African Pocket Oxford Dictionary, and a set of dictionaries of older Scots English. (The full listing of the OUP's English dictionaries can be seen here.)

When it only existed as a print dictionary, space limitations provided the OED's publisher with an obvious (if rather convenient) excuse for overlooking or giving less attention to certain regional varieties of English, based on what it was physically practical to publish. This limitation does not exist for the OED's electronic version.

That being the case, today it might be reasonable to ask to what extent it makes sense to publish separate versions of the dictionary for the regional varieties of English versus having a single version of the dictionary that encompasses all the varieties of English.

In practice, both are possible.

The enormous flexibility that the electronic format offers in terms of tagging and sorting the individual entries presumably makes a single electronic version of the OED easier and cheaper to keep up-to-date and self-consistent than maintaining separate electronic versions for the English spoken in different regions of the world (quite apart from the online OED's ease of access and general user convenience).

Besides a global electronic dictionary of the English language, print-format dictionaries covering the English spoken in particular regions (as already exists, for instance, with The Australian Oxford Dictionary / AOD) may also make commercial sense for the OUP in several of those markets; the flexibility of the electronic-format OED should also make it easy for dictionary compilers to extract (and add) region-specific terms.

Wikipedia's article on the AOD describes it as follows:
The AOD combines elements of the previous Oxford publication, The Australian National Dictionary (sometimes abbreviated as AND), which was a comprehensive, historically-based record of 10,000 words and phrases representing Australia's contribution to English. However, The Australian National Dictionary was not a full dictionary, and could not be used as one in the normal sense. The AOD borrowed its scholarship both from the AND and from The Oxford English Dictionary, and competed with the Macquarie Dictionary when it was released in 1999.

Like the Macquarie, the AOD combines elements of a normal dictionary with those of an encyclopedic volume. It is a joint effort of Oxford University and the Australian National University.

The AOD's current editor is Dr Bruce Moore. Its content is largely sourced from the databases of Australian English at the Australian National Dictionary Centre and The Oxford English Dictionary. It also draws on the latest research into International English.

The second edition contains more than 110,000 headwords and more than 10,000 encyclopedic entries.
Unfortunately, it is not evident from this description how much of the additional work done for the publication of the AOD has been fed back into the electronic version of the OED, which the OUP explicitly describes as "the definitive record of the English language".

No doubt the publication strategy of the Oxford University Press for its region-specific dictionaries will ultimately be based on a combination of the cost and logistics of region-specific data collection/updating; the available scholarship resources and opportunities for collaboration with local universities or other relevant educational institutions; market demand and potential; and the fear of losing overall market share to competing dictionary publishers.

It would certainly be interesting to know the OUP's long-term plans in this area, but I would assume that for commercial reasons they will mostly be keeping those to themselves.

Finally, in your posting above you commented, "time and again I see the lists of "new " words to be entered in the OED and marvel at how they increasingly are coming from the US regional variety of english .. is that the way english should be defined ?? .. the BIG market approach ??"

The compilers of the OED have set out their criteria for the incorporation of new terms into the dictionary here and here.

I will add only two points: firstly, those lists of new words you have seen are likely to have been cherry-picked by newspapers, TV and other media for their entertainment value, and may therefore present a skewed picture of the totality of new words being included; and secondly, it seems logical that the USA, with about two thirds of the world's native speakers of English and a prolific entertainment industry that actively disseminates its output worldwide, is highly likely to contribute substantially more terms satisfying those criteria than Australia, which has a much smaller number of native speakers and an entertainment industry with less global penetration than that of the USA.

However, it would be interesting to have a quantitative analysis of the respective contributions to the OED's new entries from the various English-speaking regions of the world, in order to assess empirically the correspondence between the proportion of each region's contributions and its proportion of the world's native speakers of English.

Post Reply