While the corpus of Old Bailey transcripts is obviously an invaluable resource for criminologists and historians of the English criminal justice system, it occurs to me that because it records spoken rather than written speech -- and does so for a period that spans nearly 250 years -- it must stand unique as an evolving record of the differences between written and contemporaneous spoken English.From 1674 through 1913, court reporters wrote detailed accounts of virtually every trial held at the Central Criminal Court, known as the Old Bailey, where all major criminal cases for Greater London were heard.
The corpus includes 121 million words describing 197,000 trials over 239 years. According to researchers, it represents the largest existing body of transcribed trial evidence for historical crime; it is, they say, the most detailed recording of real speech in printed form anywhere in the world.
Scientists have now carried out a computational analysis of those words showing how the British justice system created new practices for controlling violence. The study, “The Civilizing Process in London’s Old Bailey,” in The Proceedings of the National Academy of Sciences, is a collaboration between two computer scientists, Simon DeDeo of Indiana University and Sara Klingenstein of the Santa Fe Institute, and a historian, Tim Hitchcock of the University of Sussex in England.
This study demonstrates “an important new way to do historical research,” said Brett Bobley, director of digital humanities at the National Endowment for the Humanities. Historians may study collections of individual items — books, old letters or newspapers — but they can’t read an entire library; computers, he said, can do just that.
The Old Bailey archive was digitized a decade ago into a free and searchable database (http://www.oldbaileyonline.org) in which every defendant’s name is tagged by gender, crime, location, the victim’s name and address, verdict, and any punishment.
To find patterns, the scientists looked at when and how often certain words occurred.
“Say you walk into a trial in 1750 and pick out one word,” Dr. DeDeo said. “How much can you learn about what the trial is about? If you hear the word ‘kick,’ you might associate it with violence, but you could not be certain.
“But by 1850, if you hear the word ‘kick,’ you would know a lot about what the bureaucracy was going to do,” he continued. “With the passage of time, each word carries more information based on accumulating trial data. And this is what we can quantify.”
To simplify their task, the researchers turned to the 1911 edition of Roget’s Thesaurus, which sorts 26,000 distinct English words into 1,040 numbered categories called synonym sets. For example, words involving love and affection are in the high 800s, money and wealth in the low 800s. “Kick,” as in striking a blow, is No. 276, while killing is No. 361.
“The beauty of this,” Dr. DeDeo said, “is that for every word we have a number that equates with a meaning” that can be modeled mathematically.
One key finding is the gradual criminalization of violence.
In the early 1700s, violence was considered routine. A trial about theft, Dr. DeDeo said, might include testimony “in which people gouge out each other’s eyes, are covered in blood and get killed.” But by the 1820s, the justice system was focused more on containing violence — a development reflected not just in language but also in the professionalization of the justice system. “The changes occurred under the radar,” said Dr. Hitchcock, the British historian.
One such change revealed by the study is that trial records became medicalized. In the mid-19th century, doctors start showing up in large numbers to give evidence and evaluate causes of death.
Over time, the transcripts have more superlatives and intensifiers — words like “very,” “so much,” “most” — in reference to acts of violence. Exaggeration is normal in a courtroom, but violence brings out more hyperbole; if someone steals your wallet, you are upset, but if someone beats you up, you are likely to use stronger language.
The Old Bailey transcripts ended in 1913 as publication costs grew prohibitive and newspapers took over the role of covering trials.
The NYT article does not address the potential use of the corpus for investigating these relationships. But in such a comprehensive trove of data there must be decades' worth of illuminating linguistic data mining and analysis to be had -- moreover, it is one that presumably encompasses the full range of human behaviour, occupation, industry and socioeconomic status, which potentially enables the painting of a very subtle picture of the history of the spoken language.