This post is inspired by three things: a recent IHR paper given by Rosanna Sornicola, a paper given at the International Medieval Congress in 2011 by Peter Stokes of KCL and some of the comments on a previous post of mine about charters. It aims to ask a deceptively simple question: what do we mean by the full text of a charter?
To start with Rosanna's paper, it was entitled "What the legal documents of the early middle ages can tell us about language: the case of 9th- and 10th-century charters from Southern Italy", and was pretty much as it said in the title. She's a professor of linguistics interested in the development of the Romance vernaculars out of Latin. It's a question that's been debated for more than a century, but the answers that are being suggested now are far more complicated than a simple change between two languages. Most of the models now are of the coexistence of Latin and the vernaculars, diglossia, with the locus of change not the language per se but the social groups who used a particular register of language. There was no unitary route between Latin and the vernaculars, but many different routes.
Rosanna was exploring one particular context for such change: southern Italy in the ninth and tenth century. Less attention has been paid to linguistic change there than in France or Spain, but it presents an interesting contrast. Unlike other areas, there isn't the same cultural break as with the Merovingians in France or the Lombards in northern Italy, with the arrival of an essentially illiterate ruling class. Naples and Amalfi, in particular, had a rich and relatively autonomous cultural life. They were also much less influenced by Carolingian cultural reforms, which have sometimes been claimed to be key to developments elsewhere.
Instead, Rosanna was arguing for the persistence of late antique forms of Latin in the south, but this is a late antique Latin that is already substantially changed from ideas of "classical Latin". The proliferation of the accusative in prepositional phrases, for example, such as "una cum alias terras meas", is already visible in Pompeii graffiti and Ravenna papyri, as are plurals such as "campora" (fields).
Rosanna went on to discuss various other syntactic forms visible in the charter corpus: I think many of the examples may have been more striking to those whose Latin is better than mine to start with. But there was one particular quotation in her handout I want to give. It's from a charter from Gaeta in 918 (CodCajet 1, XXIV, 43), where someone states:
"mea boluntatem bendidisse et bendidit bobis" (By my own free uill I have zold and zell this to gou).
My translation isn't accurate, of course, but that's the whole point. How do you translate something that's lurking uneasily between Latin and something else like that? And what on earth can you do with free text spelt like that? For Rosanna's purposes it's ideal. For anyone who's trying to track down all documents about sales, it's a massive problem.
Which is where we backtrack six months to Peter Stokes talking about Anglo-Saxon Cluster and the problem of integrating different ideas of what a charter is. There's already been a slightly bad–tempered post about this paper from Jon Jarrett, who I think for once got distracted from the key point. Which is that a lot of the difficulty of integrating four projects all talking about the same documents is that the charters can be conceptualised in very different ways.
What is a charter in terms of these projects' focus?
1) In ESawyer it's a document, with the main point being creating an index to help locate it and discussions of it.
2) In ASChart it's a text (a string of words) with a date. (It's worth noting here that this is specifically said to be a pilot project and to focus on marking up texts with XML, so it was not intended to be a replacement/equivalent of Sean Miller's useful database).
3) In PASE a charter is a source, a set of factoids (X did Y). In fact it's the old game of gutting sources for snippets of information.
4) Finally in Langscape a charter is a unique document (every manuscript is a different version, there's no critical edition).
All this is reflected in very different attitudes to what form any "full text" included in the project takes. ESawyer includes for many records (but not all) the text of charters, taken from a several different editions. ASChart, as already mentioned, includes (non-searchable) full text with certain sections (such as dispositive words) marked up. PASE doesn't include the full text of charters, but does, in theory, include all the main data points from them. Finally, Langscape includes three different versions of each text: semi-diplomatic, edited (i.e. broken up into lexical units for analysis) and glossed (provided with a headword and translation).
So when we talk about a database including the full text of a charter, we're potentially thinking about very different things, with varying amounts of editorial intervention. First of all, there's the question of whether you're editing the material from scratch (which is very time-consuming), or relying on existing editions, which may not be consistent (especially with large corpuses). Secondly there's the possibility of using XML mark-up to highlight particular sections. Finally there's the possibility of full-text search.
What Rosanna's paper strongly suggested to me is that full-text search is something of a red herring in most cases. Short of the kind of extreme editing that Langscape includes, I can't see how you can often find things reliably in texts where the spelling is so erratic. This is going way beyond problems of Latin stemming (which have been researched for at least 25 years). Full text search is only really likely to work effectively where you've got fairly standardised Latin AND consistent editorial practices. Or possibly for individual words/phrases which are sufficiently distinctive and not spelled in too many alternative ways: you might be able to find most examples of "friskingas" (suckling pigs) in a database of charters, for example, if you sit down and check half a dozen similar words. But I don't see that you're going to get very far trying to pick out sales, for example. And I was recently staring at a transcription of a St Gall charter for some while in bemusement before I worked out that "drado" meant someone was going to hand over (trado) some property.
Similarly, ASChart is, to my mind, an interesting exercise in showing that XML mark-up of a charter in terms of its diplomatic doesn't really get you a whole heap further in its study (which may be the reason it didn't get beyond the pilot project stage). It's possible to use it to pull out a list of invocations, for example, but you get something that isn't easily scalable to large collections, because so many invocations are marginally distinctive. There's not a substantial difference, for example, between starting a charter "In nomine Domini nostri Iesu Christi mundi saluatoris" and "In nomine Domini nostri Iesu Christi saluatoris mundi", but I can't see how you can easily find an algorithm that would automatically conflate phrases that are "similar" in this way.
What, in theory, might be more helpful is using XML mark-up combined with full-text search, so that you search only in the dispositive words, say, for "vendo" or variants thereof. But I'm not yet convinced that with the kind of variability you have in early medieval charters, you would really end up saving enough of the users' time to justify all the work of tagging this data in the first place. I'd be interested to hear from people who work more on diplomatic on this point – what do you think XML might do for you?
I said in discussing the Making of Charlemagne's Europe project I'm now working on that we're not providing the full text of the charters. It's more accurate to say that we won't be systematically providing the full text of them – we'll link to the full text online, where it's freely available, and provide references to printed sources otherwise (much as PASE does). The hope is that this gives users most of what they need, without the additional expense of either licensing full text from previous editions (it's interesting to note that some publishers are now republishing nineteenth century cartularies) or having to spend large amounts of time scanning/OCRing material. But it's fair to say that I'm starting to realise how much more there is to the "full text" of a charter than at first meets the eye.
