One of the reasons for this is that such corpuses contain many more examples of morphological and syntactical forms than they do of any individual words: to date they are all far smaller than the
OED. The great advantage of corpuses is that they are carefully chosen according to stated principles to reflect particular categories of texts produced over the relevant period. Conclusions based on them are transparently grounded in evidence that can be fully described. The
OED - a vast treasure-house of linguistic data - is based on evidence that varied according to what was available to the lexicographers and what they and their readers chose to record (see e.g. our pages on
Outline material and elsewhere). Despite its tempting super-abundance it is dangerous to draw large-scale (or indeed small-scale) conclusions from this evidence, which may be partial or eccentric in ways difficult to second-guess. (For some comparisons between corpuses and the
OED see
Hoffmann 2004 and cf.
Brewer 2006.)
The
OED3 lexicographers likewise show themselves fully aware of the problem of depending too much on
OED for information on the lexical development of the language. In 2002 Philip Durkin published an analysis of a small sample of the revised alphabet range in
OED3 (
M-mamzer) 'to illustrate how revision work on all areas of the text...is transforming the [
OED's] record of the vocabulary of English' (
Durkin 2002: 66). Noting (p. 67) that non-literary texts had been especially fruitful sources for revising
OED1's record - not surprising given the early lexicographers' special attention, for a variety of reasons, to literary sources (see our page on
Literature and the nation under
OED1 intellectual climate) - he began and ended his article by referring to
OED's role in charting the development of the English language, and the reliance by scholars on the
OED for this purpose:
Attempts to characterize the development of the vocabulary of English in various historical periods have, understandably, often taken as their basis the documentation provided by the Oxford English Dictionary. (p. 65)
A major aim of OED3 is to make the dictionary's methodology more transparent at all levels...it is to be hoped that this, together with the revised data, will provide a powerful tool for future studies of the development of English lexis. (p. 76)
His conclusions contain two warnings:
- 'the overall rate of change [sc. between OED2 and OED3, as the latter revises the former] is sufficient to demonstrate that considerable caution should be exercised when using OED2 dates for sixteenth-century items for statistical purposes', since 'approximately a third of OED2 words and senses are being antedated during the course of work on OED3'
- 'any dictionary dates should be treated with a certain amount of caution' (p. 70; cf. pp. 75-6).
Durkin's lists of date-changes for words treated in
OED2 and
OED3 make fascinating reading but present a complex picture. While many words identified in
OED2 as first used in the sixteenth century are being antedated to the fifteenth century, at the same time many words identified in
OED2 as first used in the seventeenth century (or later) are being shifted back to the sixteenth century. It is too early to draw firm conclusions from this evidence: we shall have to see how the pattern of documentation develops as the third edition progresses. (See
previous page for analysis of
OED3's quotation distribution in 1500-1599 between September 2003 and December 2005 - i.e. subsequent to Durkin's article.)
Durkin also lists (pp. 67-8) the sources which the
OED3 revisers have found most fruitful for antedating
OED2's records of Early Modern English vocabulary: