Wednesday, 18 May 2011
The other pub in the same village is now owned by a group of villagers, and seems to be thriving. It's odd how I will pass these things for months before it occurs to me to check whether they are recorded properly on the map. Having checked, I've added the open pub (which was missing), and changed the tagging on this one to show it as disused.
This leads on to the controversy about how best to tag features that are no longer in use. I've decided to mark this one as "amenity=pub, disused=yes", which is one of the common approaches. But that's not the only option, and there are good arguments against the approach I used.
For those who don't follow this stuff, the main problem is that it's as though I'm saying "this is a pub - oh not it's not". Anyone who is very thirsty might stop listening after the first half of the sentence. If I'd said "this WAS a pub" it would be OK. Similarly, somebody who is using the raw data to draw pubs on a map, or provide directions to the nearest pub is normally going to search for things described as a "pub". They will find more than 30,000 in the UK, including this one (unless they listen carefully).
What they probably want to find is pubs that are still in business. So they want to ignore the 70 or so that are already marked as "disused", and a few more that are marked as "closed". They will have to eliminate these explicitly. There are also more than 100 features described as a pub where the name is set to something like "Royal Oak (closed)". There are a few dozen more where there is a note attached (in free text) to the same effect. There are also a variety of less standard ways of indicating the same thing - all against features that are basically marked as being a "pub". The more of these that they handle explicitly, the more accurate their data will be. Any they miss can mislead their users.
The main alternative is to describe these things, not as a "pub", but as a "former pub", "disused pub" or even "dead pub". There are several dozen examples of each of these in the OSM data for the UK. The general approach is fairly common, but the actual values that are used tend to vary quite a bit. This approach has the advantage that none of these will match a simple search just for "pub". So the default behaviour of any software that uses the data is going to be what we would expect most people to intend. On the other hand, if they are mainly interested in pubs that are closed, or all pubs whether they are closed or not, then this data is not going to be a lot of help.
So given the choice, why did I chose to mark this is "amenity=pub, disused=yes", rather than "building=disused_pub" or something similar?
Partly it's because there is a well-established scheme for tagging pubs, and another well-established scheme for tagging things that are disused. Sticking to these keeps the data fairly clean.
I'm also a bit suspicious of advocating ways of tagging that make assumptions about how the data is going to be processed. Who is to say that it is most important to make life easy for people who want to identify active pubs? It's the obvious case, but what about people who are interested in pub history, pub architecture, the number of closed pubs. Or, in checking data quality against some external directory. Or (perhaps more likely) giving directions such as "turn left at the Royal Oak".
I reckon that anyone who seriously wants to extract active pubs from the database is going to find it fairly easy to filter out ones that are disused, as long as the tagging follows some basic principles. And if anyone thinks it is going to be too difficult to ignore features tagged as "disused=yes" then they should expect much bigger problems handling the other variants.
But mainly I've tagged it this way because that's what I see as I am riding past. From a distance I spot a pub, and when I get close I realise that it's disused. Once it has been developed it may look different, but for now, that's what it seems like to me.
As the OSM database gets more rich, and more detailed, and covers a wider variety of objects there are a number of areas where contributors need different forms of tagging to describe subtle differences between similar features. They already have access to a number of different idioms that they can use to express their different perceptions.
Some people have a problem with that. They want to drive out subjectivity by defining explicit data structures in great detail. In some areas this is probably the right approach. Consistency can sometimes be more important than other considerations. But in many areas a more subjective and expressive approach can (and in my view, should) be encouraged.
The arguments for avoiding forms such as "amenity=pub, disused=yes" are understandable, but as a contributor it has the advantage of being easy to understand, and apply in different situations. The form is already widely used. Most importantly it expresses what I see better than the alternatives. Although alternatives are also widely used, they lack the level of consistency that some potential users of the data may need.
This isn't a problem just for pubs, of course. Similar issues arise in OSM with abandoned railway stations, canals, and other amenities.
Nor is it a problem unique to Open Street Map. In the same village there's a house with a painted sign outside that says something like "Church House, Formerly All Saints Church, Now a Private Residence". I wonder why they went to the expense of putting that up?
Posted by gom1 at 14:45