480px-Kencf0618FacebookNetwork

Data visualization of Facebook relationships by Kencf0618

Network analysis is one of those areas which keeps on cropping up as a possibility for medieval researchers. (There have been some interesting discussions and examples previously at A Corner of Tenth Century Europe and Cathalaunia, which I'll discuss more in a later post).

Since one of the hopes of the Making of Charlemagne's Europe project I'm working for is that the data collected can be used for exploring social networks, I thought it would be useful to find out a bit more about what has been done already. So is this my first attempt to get a feeling for what's been done with medieval data and what it might be possible to do.

I should note at this point that I'm drawing very heavily on the work of Johannes Preiser-Kapeller, especially his paper: "Visualising Communities: Möglichkeiten der Netzwerkanalyse und der relationalen Soziologie für die Erfassung und Analyse mittelalterlicher Gemeinschaften". I found out about many of the projects I discuss from this paper, so I am grateful for to him for providing such a primer. My focus is slightly different to his, however, as what I'm particularly interested is the type of research questions that social network analysis might be used to answer, more than the details of particular projects.

Defining networks
One immediate problem in knowing where to look comes because the key mathematical tools and visualization techniques can be applied to very different kinds of data. The underlying concepts come mainly from graph theory. Wikipedia defines that as: "the study of graphs, which are mathematical structures used to model pairwise relations between objects from a certain collection. A "graph" in this context is a collection of "vertices" or "nodes" and a collection of edges that connect pairs of vertices. A graph may be undirected, meaning that there is no distinction between the two vertices associated with each edge, or its edges may be directed from one vertex to another."

What that means is that you can use the same basic techniques to study anything from a road network via the structure of novels, to how infections spread through a population. But it also means that the type of network and how you can analyse it depends crucially on several factors. These include how you define a node and edge, whether all edges are the same (or whether you're counting the connections between some pairs as somehow different/more important than others) and whether it's a directed or undirected graph.
The size of the network is also crucial, and that differs vastly between disciplines: it's when you see a physicist commenting that "At best power law forms for small networks (and small to me means under a million nodes in this context) give a reasonable description or summary of fat tailed distributions" that you know that not all networks are the same kind of thing. One of the things that interests me when looking at projects is the extent to which data visualization is important in itself or whether the emphasis is on mathematical analysis of the underlying data.

Data quality

There are, inevitably, particular issues with data quality for medieval networks. The obvious one is whether the information you have is typical or whether the reasons for its survival bias our evidence excessively from the start. (The answer is almost certainly yes, but medievalists wouldn't know how to cope if they had properly representative sources, so let's move on rapidly).

Another big issue is identifying individual nodes. You can in theory have anything as nodes: an individual, a "family", a manuscript, a place, a type of archaeological artefact, a gene, a unit of language. (I'm not going to look at either linguistic or genetic network analysis in what follows, but there are projects doing both of those). The problem with medieval data is that there's almost always some uncertainty about identification: are two people the same or not? What do you do about unidentifiable places? How do you decide whether two people belong to the same family?

Then there's question of how you define a connection between two nodes. What makes two people connected to one another? The data you extract from the sources obviously depends on decisions made about this, but for a lot of medieval networks there's the added complication that not all connections are made at the same time. If you have a modern social network where A connects with B and (simultaneously) B connects with C you can make certain deductions about the network from data about whether or not A and C are connected. If you have limited medieval data where A connects with B and 20 years later B connects with C, can you model that as one network, or do you have to take time-slices across the network (which may often reduce your available data set from small to pathetic)?

Varieties of projects
Of the medieval history projects I've come across so far (I suspect there's a whole slew of others in fields such as archaeology), most seem to fall into three categories. There are studies on networks of traders, such as by Mike Burkhardt on the Hanse. There are probably other similar examples: I've not yet had a chance to investigate whether the important work by Avner Greif on traders in the Maghreb also uses network analysis or not. But these kinds of studies are unlikely to be relevant to any early medieval project, because they will almost certainly rely on relatively large-scale sets of data from a short chronological range (account-books, registers of traders etc). Such data sets simply don't exist for the periods I'm interested in.

The other two types of medieval network studies I've noticed are ones which are looking at intellectual networks or the spread of ideas (with some possible overlap with spread of objects more generally) and ones using network analysis to study how a society operates (social network analysis in its most specific sense). For both of these, I'm aware of some early medieval studies and others that are potentially applicable to early medieval style-data. I'll cover intellectual networks in this post (including a discussion of a recent IHR seminar) and then move onto social history uses of network analysis in the next post.

Intellectual networks/spread of ideas: example projects

1) Ego-networks
There are several forms that network analysis of intellectual networks can take. One obvious one is as a more quantitative version of what's been done for many years (if not centuries): the study of "ego-networks", the intellectual contacts that a particular individual has.

This is the basis for the study by Isabelle Rosé of Odo of Cluny (Rosé, Isabelle. "Reconstitution, représentation graphique et analyse des réseaux de pouvoir au haut Moyen Âge: Approche des pratiques sociales de l’aristocratie à partir de l’exemple d’Odon de Cluny († 942)", Redes. Revista hispana para el análisis de redes sociales 21, no. 1 (2011)

Rosé's study isn't strictly of just an ego-network, since she also tries to analyse the connections that Odo's contacts had with each other in which Odo wasn't involved, but the centre is clearly Odo. Rosé uses a mix of different sources (narrative and charters) to construct snapshots of Odo's connections over time: she ends up with a PowerPoint slideshow showing the network for every year (available from here). She wanted to include a spatial dimension to the networks (showing where connections were formed), but couldn't find a way of doing that.

Rosé's account includes some useful detail about her methodology. The data she collected in Excel consisted of 2 people's names, a type of connection and a direction for it, a source and start dates and end dates for the connection. She also codes individual nodes based on the person's social function (monk, layman, king etc) and the aristocratic group they belong to (Bosonids etc); this is reflected in their colour and shape on her network diagrams.

There are a lot of questions raised immediately about how such decisions are made (period of time allocated to a particular connection, how she decides on who counts as on one of the groups); all the kind of nitty-gritty that has to be sorted out for any particular project.

What does Rosé's use of network analysis allow that a conventional analysis of how Odo's social networks helped him couldn't do? One is that the data collection method encourages a systematic searching for all connections that an unstructured reading of the sources might miss. Secondly, the visualization of networks (especially as they change over time) gives an easy way of spotting patterns, allowing periodization of Odo's career, for example. Thirdly, it's possible to compare different sorts of tie, e.g. she shows that the kinship networks (whether actual or the fictive kinship of godparenthood) consists of a number of unconnected segments. But when you include ties of kinship and ties of fidelity, you do get a single network. Finally, Rosé uses a few formal network metrics to rank people by their centrality to the network (their importance to it) and their role as cut-points (people whose removal from the network would mean that there were disconnected segments of it).

Apart from this restricted use of metrics, Rosé is mostly doing visualization and I suspect that many of her conclusions are confirmations of things that a conventional analysis of Odo's social network without such complex data collection would have come up with anyhow: who Odo's key connections were, the importance of the fact that right from the start Odo had connections to the Robertines and also the Guilhemides. But one of her most interesting comments was that analysis showed a move away from kings as central to social networks, which she connected to a move to "feudalism". If we could find comparable data sets (and there are obvious problems in doing so), it'd be interesting to see whether kings outside France become non-central to reforming abbots in the same way.

2) Scale-free networks
There are a couple of articles I want to highlight which talk about scale-free medieval networks and which I want to discuss more for some of the difficulties they raise than the answers they're coming up with. One is work that hasn't yet been published, but has been publicised: analysis of the spread of heresy by Andrew Roach of Glasgow and Paul Ormerod. The other is Sindbæk, S.M. 2007. 'The Small World of the Vikings. Networks in Early Medieval Communication and Exchange', Norwegian Archaeological Review 40, 59-74, online.

But first, a very rough explanation of scale-free networks, which means introducing one or two basic mathematical/statistical ideas. The first is the degree of a node, the number of connections it has. The second is the distribution of these degrees, i.e. what percentage of nodes have 1 degree, 2 degrees, etc. Scale-free networks are ones where the degree distribution follows a power law: roughly speaking, you have a few very well-connected nodes and then a long tail of a lot of poorly-connected nodes.

The crunch here is "roughly-speaking": there are all kinds of issues about whether any particular example really does represent the power law distributions that supposedly lie behind it. It's a reminder that if we as historians we do start doing more of this kind of work, we're probably going to need some good mathematicians/statisticians behind us pointing out possible issues.

Without seeing the data, it's impossible to tell whether Roach and Ormerod are accurate about medieval heresy spreading through such types of networks. But Søren Sindbæk's paper on Viking trade suggests that the interest here isn't strictly whether we're talking about scale-free distributions or not. It's a more general question about how the very localized societies within which the vast majority of medieval people lived could nevertheless allow the relatively rapid long-range spread of everything from unusual theological ideas to silver dirhams.

Søren's main point is that there are two possible ways that such small-world networks can evolve: either you can have a few random links between two otherwise largely separate networks (weak-ties model) or you can have a few very well-connected nodes amid the otherwise very localised societies ("scale-free"). Which of these two ideal type of networks you have affects considerably the robustness of the network: i.e. if you have one or two crucial hubs that get destroyed by attackers, the whole network falls apart, but random attacks aren't likely to have much effect, while the weak-ties model is more vulnerable to a random attack (if a random link that ties two networks together happens to get severed). Søren tries to see which type of network best fits two very limited sets of data (one based on the Vita Anskari) and one on archaeological data. The answer, not surprisingly, is "scale-free" networks.

I say the answer isn't surprising because the medieval world is full of hierarchies of people and places, and some of the defining characteristics of those at the top of such hierarchies are that they move around more or they have connections to a lot more places. I found Søren's paper mainly revealing in giving a feel of the numerical bounds for where simple visualization is a useful tool: a plot of 116 edges (see Fig 3) is already getting complex to visualise; one with 491 edges (see fig 4) almost impossible to take in by eye.

As for Roach and Ormerod, the fact that heresy was mainly spread through a small number of widespread travellers isn't exactly news. We'll have to wait and see whether they can provide something that gives a new dimension of analysis.

3) Six degrees of not-Alcuin
Finally for this post, I want to discuss an IHR seminar I heard back in May: Clare Woods from Duke University talking about "Ninth century networks: books, gifts, scholarly exchange". Clare's coming to intellectual history from a slightly different angle from Isabelle Rosé: she has been editing a collection of sermons by Hrabanus Maurus for Archbishop Haistulf of Mainz, and thinking about how to represent the relationship between manuscript witnesses visually (rather than just rely on verbal descriptions or stemma diagrams.

The point here is that manuscript stemma can be thought of as directional networks between manuscripts, whose place of production can be located (more or less accurately). (There are also projects endeavouring to generate manuscript stemma automatically, but I'm not discussing those at the moment). Clare is also using data from book dedications, known manuscript movements, and the evidence of medieval library catalogues.

Also in contrast to Rosé, Clare was interested in the possibility of getting beyond the spider's web idea of intellectual history. i.e. that Hrabanus (or Odo) sits at the centre and everyone else revolves around him. This is a particular issue for Carolingian intellectual history because of Alcuin. We have by far more letters of Alcuin preserved than of any other Carolingian author (Hincmar probably comes second, but his letters still haven't been edited properly), so if you use Rosé's techniques you're liable to end up overrating Alcuin's significance vastly.

Clare's main focus was on simple tools for visualizing this information, ideally in both its spatial and temporal dimensions. As I said above, Rosé was using Excel, Powerpoint and NetDrawand was finding problems in showing locations. Clare was using Google Maps for the spatial element, but thought she'd need Javascript (which she doesn't know) to show changes over time. I have seen projects which use GoogleMaps and a timeline, such as the MGH Constitutiones timemap (click on Karte to follow how Charles IV, the fourteenth century Holy Roman Emperor moved around his kingdom). I don't know how that is made to work.

I'd be interested to know from more informed readers of the blog if there are such tools available that non-experts can use to produce geo-coded networks of this kind. Gephi seems to be popular free software for network analysis, and I've seen a reference to a plug-in for this which allows entering geo-coded data. The Guardian datablog recommends Google Fusion Tables.

But whatever software you have, there are the normal issues of data quality. There's a particular problem with data coming from a very long timescale: in questions David Ganz wondered whether the evidence was getting contaminated by C12 copies (I wasn't quite sure whether that's just because there are so many manuscripts of all sorts from later). How do we know whether manuscript movements do reflect actual intellectual contacts, rather than just random accidents of them getting moved/displaced etc? Clare also discussed the problems of how you mapped a manuscript which came from "northern Italy". Her response was to choose an arbitrary point in the region and use that – at the level of approximation and small number of data points she's using, it's not a major distortion.

The data sets for early medieval texts are always going to be tiny: having more than 100 manuscripts of one text from the whole of the Middle Ages is exceptional. (The largest transmission I know of is for Alcuin's De virtutibus et vitiis of which we have around 140 copies). But Clare's project does potentially offer the possibility of combining her data with other geo-referenced social networks to get an alternative and wider picture of intellectual connections in the Carolingian world. Combining data-sets is likely to lead to even more quality issues, but it does offer the possibility of building up new concepts of the Carolingian world module by module.