As a follow-up to my first post on social network analysis, I'm now gradually reading some of the many books and articles on historians' use of network analysis that readers of my blog suggested. And having read a couple of chapters of Giovanni Ruffini, Social Networks in Byzantine Egypt, I'm coming to realise that one of the most difficult issues for those of us working with documentary sources is deciding what counts as a connection between two people and what links should therefore be included in the network.

The majority of the late antique/medieval network analysis studies that I've looked at work by hand-crafting links. Someone sits down, works their way through their sources and picks out by eye every link between two people (or two places). Often, they also categorise the link. For example, Elizabeth Clark, when studying conflicts between Jerome and Rufinus, divided links into seven different types: "marriage/kinship; religious mentorship; hospitality; travelling companionship; financial patronage, money, and gifts; literature written to, for, or against members of the network; and carriers of literature and information correspondence."

(Elizabeth A. Clark, "Elite networks and heresy accusations: towards a social description of the Origenist controversy", Semeia (56) 1991, 79-117 at p. 95).

Similarly, Judith Bennett did the same thing when looking at connections of families recorded in the Brigstock manorial court records:

The content of these transactions has been divided into six qualitative categories that collectively encompass all possible transactions. These categories are based upon whether the network subject interacted with an-other person by whether the network subject interacted with an-other person by (i) receiving assistance, (2) giving assistance, (3) acting jointly, (4) receiving land, (5) giving land, or (6) engaging in a dispute.

(Judith M. Bennett, "The tie that binds: peasant marriages and families in late medieval England", Journal of Interdisciplinary History 15 (1984), 111-129, at p. 115).

And for networks of places, Johannes Preiser-Kapeller, "Networks of border zones: multiplex relations of power, religion and economy in South-Eastern Europe, 1250-1453 AD", in Revive the past: proceeding of the 39th conference on computer applications and quantitative methods in archaeology, Beijing, 12-16 April 2011 edited by Mingquan Zhou, Iza Romanowska, Zhongke Wu, Pengfei Xu and Philip Verhagen,. (Amsterdam, Pallas Publications, 2012), 381-393, combined existing geographical datasets on late antique land and sea routes with details of church and state administrative networks he's compiled from documentary sources.

Such approaches create very reliable networks, but they're hard to scale up. Clark looks at 26 people; Judith Bennett has 31 people and 1,965 appearances in extant records from 1287-1348. Preiser-Kapeller has around 270 nodes and 680 links in total. Rosť's study of Odo of Cluny, which I discussed in the previous post, had 860 links. For charters, such hand-crafted networks would probably only allow the exploration of small archives or individual villages.

What is more, researchers often want to carry out social network analysis as an offshoot of more general prosopographical work, such as creating a charter database. But it's hard to analyse links until you've first created a prosopography, because it's only when you've been through all the charters that you have a decent idea of whether two people of the same name are actually the same person. (There's a further issue here about whether you may end up with circular reasoning between prosopography and network analysis, but I'll leave that for now). So in theory, you'd need to go through all the charters first to identify people and then have to go back to assess whether or not they are linked in a meaningful way, doubling your work.

As a result, some researchers have started trying to see if there are ways of automatically creating networks from existing databases or files, developing methods for analysing charters that (in theory) can be scaled up relatively easily. In the rest of the post I want to look at the relatively few projects I'm aware of attempting to do this and outline how we might approach the problem with the Making of Charlemagne's Europe dataset.

The three projects I'm looking at are by Giovanni Ruffini, working on the village of Aphrodito in Egypt (see reference above), Joan Vilaseca, who's been experimenting on creating graphs from the early medieval sources he's collected at Cathalaunia.org and a controversial article by Romain Boulet, Bertrand Jouve, Fabrice Rossi, and Nathalie Villa, "Batch kernel SOM and related Laplacian methods for social network analysis", Neurocomputing 71 (2008), 1257-1273.

Ruffini is explicit about how he's creating his networks and the problems that may result from this (pp. 29-31). He's taking documents and creating "affiliation networks": all those who appear in the same document are regarded as connected to one another. As he points out, the immediate problem is that this method can introduce distortions if you have one or two documents with very large numbers of names. For example, one of the texts in his corpus is part of the Aphrodito fiscal register and has 455 names in it, while the average text names only eleven (p. 203). If such a disproportionately large text is included, analysis of connectivity is badly distorted, with all the people appearing in the fiscal register appearing at the top of connectivity lists.

The same effect can be seen in Joan Vilaseca's graphs. If you look at his first attempts at graphing documents from Catalonia between 898-914, they're dominated by the famous judgement of Valfogona in 913.

But Joan's graphs also show an additional problem. His first graphs also give great prominence to Charles the Simple and Louis the Stammerer, because they appear so often in dating clauses. When he starts looking for measures of centrality in his next post he initially finds the most connected people to be St Peter, the Virgin Mary and Judas Iscariot (who appear frequently in sanction clauses).

This brings us to the key question: what does it mean to be in the same charter as another person? The problem is that people are named in charters for so many different reasons: they may be saints, donors, witnesses, relatives to be commemorated, scribes or even the count whose pagus you are in. People may also appear as the objects of transactions: some of our early decisions on the Charlemagne project were deciding how we would treat the unfree (and possibly the free) who were being transferred between one party and another. Such unfree have an obvious connection to the donor and the recipient. But do they have any meaningful relationship to the witnesses or the scribe? At least with witnesses, there's a reasonable chance in most cases that they all physically met at some point, but I don't know of any evidence that the unfree would necessarily have been present when their ownership was transferred by a charter.

So simple affiliation networks, even when you eliminate disproportionately large documents and people mentioned only in dating or sanction clauses, can still be inaccurate representations of actual relationships. One possible response to this problem is to include as links only types of relationships that are themselves spelled out in the charters. Joan has some graphs showing only family and neighbourhood relationships, for example. Ruffini (p. 21) suggests the possibility of using data-sets where a link is defined as existing only when there is a clear connection between two parties in a document e.g. between a lessor and a lessee. But as he points out, we would then have much smaller data-sets. And for early medieval charters, in particular, focusing on the main parties to a transaction only would simply demonstrate that most transaction were about people donating or selling land to churches and monasteries, which is not exactly new information.

Are there any other ways to cut out "irrelevant" connections while keeping those we think are likely to show meaning? Another approach that Joan tries uses affiliation networks, but then removes links where two people occur together in only one document. For his interest in identifying key members of Catalan society, focusing on the most important links may well make sense. But they potentially distort the evidence on one question of wider interest: how significant are weak ties in charter-derived networks? Weak ties, where two people interact only occasionally, may paradoxically be more important for spreading information or practices. Given we have only a small subset of interactions preserved via charter data, significant weak ties may be lost if we start removing data from affiliation networks in this way.

Implicitly, at least, an alternative method for selecting links within what's broadly an affiliation network is given by Boulet, Jouvet, Rossi and Villa. As they explain in their study of thirteenth and fourteenth century notarial acts, they constructed a graph in the following manner (pp. 1264-1265):

First, nobles and notaries are removed from the analyzed graph because they are named in almost every contracts: they are obvious central individuals in the social relationships and could mask other important tendencies in the organization of the peasant society. Then, two persons are linked together if:

_ they appear in a same contract,
_ they appear in two different contracts which differ from less than 15 years and on which they are related to the same lord or to the same notary.

The three main lords of the area (Calstelnau Ratier II, III and Aymeric de Gourdon) are not taken into account for this last rule because almost all the peasants are related to one of these lords. The links are weighted by the number of contracts satisfying one of the specified conditions.

Though it's not clear why people are regarded as linked if they use the same notary, the other criteria seem to be ways of trying to filter out distortions that potentially arise from notorial practices. If men are routinely described in terms of their affiliation to a lord e.g. "A the man of B", then an affiliation network will derive from a sale between "A the man of B" and "C the man of D" not only the justified links A to B, C to D and A to C, but also links that in practice are unlikely to exist or at least are not proven to do so, i.e. A to D, C to B and B to D.

So how might we balance distortions from applying the affiliation network model to charter data against loss of data or an unfeasibly high workload if we don't use this method? The model for the Making of Charlemagne's Europe database allows inputting of relationship factoids, which will catch explicit references to people as the relatives or neighbours of others. Graphs using such data will be relatively easy to construct.

We are also, however, recording "agent roles", used to identify what role a person or an institution plays within an individual charter or transaction (e.g. witness, scribe, object of transaction, granter). At the minimum, any social network analysis application added to the system should probably allow a user to choose which of these roles they want included within the graphs to be created. There should also be some threshold (either chosen by us or user-defined) for excluding documents that contain "too many" different agents. We're still not going to get the precision graphs that hand-crafting links will give, but we can hopefully still get something that will tell us something useful about how people interact.