On 23 Things we were asked (last week!) to think about the use of tags (in the sense of informal user-created labels on internet objects). One of the problems with discussing tagging is that the same technology is used for tagging very different kinds of objects, in collections that vary widely in size, and can be used both by creators of a particular object and other users. Trying to generalise about these is tricky, so I want to look at a few particular cases.

Let’s start with my blog, which has around 340 posts. If you’re looking for a specific topic, such as iGoogle or naked monks, then it’s best to search the blog, because there’s only one blog post which refers to each (although there’s now also this post as well). I use tags only as broader based groupings – for example, to group all my posts on a particular conference or on themes such as ‘US politics’, where I may not use the specific phrase in the post. It’s partly for that reason that I have a medieval’ tag, but not a ‘Carolingian’ one, because almost all my discussions of the Carolingian empire will use the term and so can be found via searches.

Even though I keep my tagging so simple and I’m an experienced librarian, my tagging of entries is still not of particularly high quality and suffers biases. I’m inconsistent about the use of the tag ‘religion’, as contrasted with specific religions, such as Christianity. I have a tag for ‘homosexuality’, but not ‘heterosexuality’, even though I discuss both. And I periodically discover that there are useful themes that I haven’t tagged, and either have to go back and update the tags, or decide that it’s too big a task (as with tagging things as ‘Carolingian’).

My tagging doesn’t need to be very good, because I’m tagging objects that already have lots of searchable text. In contrast, tagging non-textual objects, such as images or bookmarks, is a lot harder, and the quality of tagging is very variable. Take something as simple and definite as a place name. I found a couple of pictures on Flickr from the tiny Sussex village where I grew up. One is tagged ‘bonfire, bonfire night, fireworks, madehurst’. The other is tagged ‘d40, 18-55mm f/3.5-56G, lenstagged, unmodified, 20081001, madehurst, church, madehurst church, west sussex, england, uk, 200810, 3008x2000’. If I wanted to find photos from West Sussex, I’d only find the first one by knowing that Madehurst was in West Sussex (and very few people have ever heard of Madehurst). And I suspect there are pictures on Flickr from Madehurst that haven’t been tagged or captioned with a place, and that therefore I can’t find at all.

Why then, is there such enthusiasm by some internet gurus for tags? One of the articles we were pointed to for this week was Clay Shirkey’s Ontology is overrated: categories, links, tags . Shirkey has two main points. One is that methods of formal subject categorisation doesn’t work for something as big and varied as the internet. I’m working with Library of Congress Subject Headings myself at the moment in my job, and I know their many weaknesses. But it’s perfectly possible to admit that formal classification schemes often don’t work effectively, but still point out that informal tagging has even more problems with inconsistency and inadequacy.

Shirkey’s false step seems to me to be assuming that you can somehow generate adequate forms of categorisation by aggregating poor forms of categorisation. I take this to be a variant of the ‘wisdom of crowds’ approach: that averaging the views of people can sometimes give a better answer than any individual one, as for example, when guessing the weight of a cake. Unfortunately, aggregating answers only really works when people have similar levels of knowledge. If you need to ‘ask the audience’ in Who wants to be a millionaire you’ll almost certainly get the right answer for one of the early questions. For the million-pound question, they’re unlikely to be much help.

In the same way, Shirkey is wrong to claim that ‘As long as at least one other person tags something the way you would, you’ll find it’. Eventually, maybe. But if there are 241 delicious bookmarks for Edward II, how do you plod through them to find the ones about the king, as opposed to the play or the "mutant calypso/reggae/African style English dance band"? Or Edward Wells II?

Shirkey starts with a contrast between Yahoo’s attempts at categorisation and Google’s lack of hierarchy. But Google doesn’t actually make much use of tags: it uses hyperlinks and clusters of interest. A site is ‘about’ something not just because of terminology within it, but because lots of other sites point to it. When you start looking at the useful forms of recommendations in large systems, they don’t predominantly work on tags, they work on such clusters of interest. Amazon and Library Thing’s recommendations are based on the fact that the people who buy or own one book also buy or own similar books. Delicious seems to work best when you can find a person whose interests mean they bookmark the kind of sites you’re interested in, even if they tag them slightly differently. Flickr’s ability to create pools of pictures can link together specific themes more effectively than tags.

The message I’d draw is that people are often poor at labelling things, but they’re a lot better at knowing what they like or find useful. Should librarians be using tags? They may have a limited role in blogs or on social media sites, but I’m not convinced they’re the right way forward for library catalogues. Why should we make users do the work of tagging, when we can provide far more useful information for them automatically via a people who borrowed this also borrowed that button? (At the University of Huddersfield Library, they’ve got even more whizzy tricks than this, thanks to Dave Pattern). Tagging for yourself may make sense if your needs are simple: using other people’s tags is often a waste of time.