Sunday, 29 May 2011

Trying to be helpful, but...

On this afternoon's outing I met a group of cyclists riding the Sustrans Thames Valley route from Hampton Court, then planning to get a train back from Henley. They must have covered about 25 miles at the point we met, and they still had about 8 miles to go. They had stopped at a point where the Sustrans map is a bit vague. I know the route well, and it wasn't difficult to point them in the right direction.

We then got to discussing the next couple of stages of their ride.

After crossing some fields, and following some minor lanes, the Sustrans route then follows a woodland path, over a hill. It's a nice ride, but a bit steep and heavy going. It's OK for mountain bikes, I guess, and sometimes I've taken that route, but normally I bypass it and take the flatter road.

After that they would need to get off the Sustrans route to reach Henley station. The two obvious options both involve minor A-roads. Neither is great for cycling, but the one they were planning to take is quite narrow, with a few blind corners. I prefer the other. Although it is busier, there is more space, so I find the traffic less of a problem. An even better option is to take a longer route down a little side lane and along the river. I'm not sure what the official position is on cycling along that part of the riverside path, but people do, and it's a nice ride.

Proud of my local knowledge, I blurted all this out without thinking it through, then realised that I'd probably caused more trouble than I'd saved.

Some of the group were prepared to admit that they were getting tired, others were still up for more of a challenge. A couple were in favour of switching to the easier route, others preferred to stick with their original plan. I'm sure they will have sorted it all out, but having thoughtlessly caused confusion it seemed best to retreat and leave them to it. Next time I'll try to engage my brain before opening my mouth.

Thursday, 26 May 2011

Google correlate

I'm not at all sure what this means, but it's a fun time waster.

The topics that correlate with cycling are obvious enough. And since this activity is so seasonal, it's not a great surprise that skiing crops up when the data is shifted by six months.

But it completely defeats me where this lot comes from on the correlation with oenstreetmap.

Wednesday, 18 May 2011

Is this a disused pub, or isn't it?

I regularly ride past this closed pub on my "flat but quiet" 15 mile cycling loop. It strikes me as an attractive building, but a sad sight, and it seems to be on the market for redevelopment. I've not stopped to read the notices, but I assume there is planning permission for a change of use.

The other pub in the same village is now owned by a group of villagers, and seems to be thriving. It's odd how I will pass these things for months before it occurs to me to check whether they are recorded properly on the map. Having checked, I've added the open pub (which was missing), and changed the tagging on this one to show it as disused.

This leads on to the controversy about how best to tag features that are no longer in use. I've decided to mark this one as "amenity=pub, disused=yes", which is one of the common approaches. But that's not the only option, and there are good arguments against the approach I used.

For those who don't follow this stuff, the main problem is that it's as though I'm saying "this is a pub - oh not it's not". Anyone who is very thirsty might stop listening after the first half of the sentence. If I'd said "this WAS a pub" it would be OK. Similarly, somebody who is using the raw data to draw pubs on a map, or provide directions to the nearest pub is normally going to search for things described as a "pub". They will find more than 30,000 in the UK, including this one (unless they listen carefully).

What they probably want to find is pubs that are still in business. So they want to ignore the 70 or so that are already marked as "disused", and a few more that are marked as "closed". They will have to eliminate these explicitly. There are also more than 100 features described as a pub where the name is set to something like "Royal Oak (closed)". There are a few dozen more where there is a note attached (in free text) to the same effect. There are also a variety of less standard ways of indicating the same thing - all against features that are basically marked as being a "pub". The more of these that they handle explicitly, the more accurate their data will be. Any they miss can mislead their users.

The main alternative is to describe these things, not as a "pub", but as a "former pub", "disused pub" or even "dead pub". There are several dozen examples of each of these in the OSM data for the UK. The general approach is fairly common, but the actual values that are used tend to vary quite a bit. This approach has the advantage that none of these will match a simple search just for "pub". So the default behaviour of any software that uses the data is going to be what we would expect most people to intend. On the other hand, if they are mainly interested in pubs that are closed, or all pubs whether they are closed or not, then this data is not going to be a lot of help.

So given the choice, why did I chose to mark this is "amenity=pub, disused=yes", rather than "building=disused_pub" or something similar?

Partly it's because there is a well-established scheme for tagging pubs, and another well-established scheme for tagging things that are disused. Sticking to these keeps the data fairly clean.

I'm also a bit suspicious of advocating ways of tagging that make assumptions about how the data is going to be processed. Who is to say that it is most important to make life easy for people who want to identify active pubs? It's the obvious case, but what about people who are interested in pub history, pub architecture, the number of closed pubs. Or, in checking data quality against some external directory. Or (perhaps more likely) giving directions such as "turn left at the Royal Oak".

I reckon that anyone who seriously wants to extract active pubs from the database is going to find it fairly easy to filter out ones that are disused, as long as the tagging follows some basic principles. And if anyone thinks it is going to be too difficult to ignore features tagged as "disused=yes" then they should expect much bigger problems handling the other variants.

But mainly I've tagged it this way because that's what I see as I am riding past. From a distance I spot a pub, and when I get close I realise that it's disused. Once it has been developed it may look different, but for now, that's what it seems like to me.

As the OSM database gets more rich, and more detailed, and covers a wider variety of objects there are a number of areas where contributors need different forms of tagging to describe subtle differences between similar features. They already have access to a number of different idioms that they can use to express their different perceptions.

Some people have a problem with that. They want to drive out subjectivity by defining explicit data structures in great detail. In some areas this is probably the right approach. Consistency can sometimes be more important than other considerations. But in many areas a more subjective and expressive approach can (and in my view, should) be encouraged.

The arguments for avoiding forms such as "amenity=pub, disused=yes" are understandable, but as a contributor it has the advantage of being easy to understand, and apply in different situations. The form is already widely used. Most importantly it expresses what I see better than the alternatives. Although alternatives are also widely used, they lack the level of consistency that some potential users of the data may need.

This isn't a problem just for pubs, of course. Similar issues arise in OSM with abandoned railway stations, canals, and other amenities.

Nor is it a problem unique to Open Street Map. In the same village there's a house with a painted sign outside that says something like "Church House, Formerly All Saints Church, Now a Private Residence". I wonder why they went to the expense of putting that up?

Monday, 16 May 2011

"Vote early, vote often"

Once again my local authority is inviting us to vote on how to spend some of the money. In last year's participative budgeting exercise investment in the cycling infrastructure came top of the vote, and this year we have the opportunity to put it top of the list again.

This is basically a repeat of last year's exercise, wth some tweaking.

There's more detail on this year's survey here.

I imagine that among the readers of this are some who would be keen to support investment in cycling infrastructure, but find themselves living outside this area. It would, of course, be wrong for anyone who lives elsewhere to try to influence priorities in this area by completing the online voting form (which can be found here).

It would also be deeply irresponsible of me to encourage such cynical behaviour. So I wont, but I can't prevent it happening. More importantly, it doesn't look as though the local authority has put anything in place to prevent outsiders from hijacking this survey. That seems remarkably trusting for a body that is in danger of looking highly cynical. For example, this year's total allocation for cycling has been cut by 20%, and split so that a large part of it is only relevant to a small proportion of the population. The core element is about half what it was last year. Cycling infrastructure is now almost entirely dependent on an annual participative budgeting exercise, rather than any sort of strategic commitment. And as far as I can make out, the participative budget was originally partly funded by slashing the cycling infrastructure budget.

Ultimately it's all a matter of priorities, of course. The way this is set up, cycling infrastructure is competing for resources against highways maintenance, pavement repairs and maintenance, street cleaning and litter removal, improved street lighting, improved parking facilities, winter maintenance, upgrading street furniture, tree planting, and facilities for young people (such as healthy eating/gardening projects and a vehicle safety/maintenance scheme).

Some might think that most of those sound like a pretty basic list of local authority responsibilities, rather than optional extras. But these are tough times, and difficult decisions have to be made. Available funds have to be spread thinly. Not least because they cut council tax by 4% last year, and a further 0.5% this year. Which makes a lot of sense if you think that the most needy people own the biggest houses.

I find all this difficult to understand, but their approach to priorities seems to be popular. The recent local government elections pretty much wiped out the opposition parties on the local council.

"Democracy is the theory that the common people know what they want, and deserve to get it good and hard" H. L. Mencken. Or as they say these days "You're all in this together".

Saturday, 7 May 2011


I've just discovered that there are renders of Open Street Map in Welsh (here), in Scottish Gaelic (here) and in Irish Gaelic (here). I don't speak any of these languages, so none of them is going to be much use to me. Just the same, I'm pleased they are out there.

Monday, 2 May 2011


There are parts of the OSM database where it is important that contributors take a disciplined approach to the way that subjects are tagged. Data that is used to support subsequent processing will often only be usable if contributors stick to a relatively closed set of keys and values, and clearly understand their meaning.

Those responsible for the major renders and routing engines rely on sharing a common understanding with contributors in order to use the data they contribute. Contributors rely on sharing an understanding with the renderer if they are to see the results of their work. Hence both data contributors and those who use the data rely heavily on certain tags being well-documented with a closed set of options, that are applied reasonably consistently.

However, it seems to me that there are also some areas where a more open and expressive approach is appropriate. The database contains a mass of information contributed by different groups with different interests, and it needs to provide some space for them to explore and share ideas. In some areas a more flexible approach should encourage more creative and expressive contributions. On the other hand there are areas where there is too much diversity, and a greater degree of consistency would allow data to be used more widely. Neither is easy to achieve, and maintaining a balance between allowing complete anarchy and enforcing a highly structured approach is not easy. There is no single answer, but despite the difficulties, the OSM community  has (so far) proved remarkably resilient and effective in keeping enough of a balance to maintain forward momentum.

I suspect that by their nature the most committed contributors to OSM have a leaning towards more structured processes, and higher levels of standardisation. Unfortunately these tendencies can encourage a style in the wiki that is sometimes close to impenetrable, and can lead to authoritarian processes for policing the way some of these things are documented.

At the moment I am particularly exercised by the "historic" tag, which I've been trying to use in my rendering experiments. Although this tag is barely used in the major renders, it has been widely applied in the database. So presumably it is seen by a lot of contributors as having value. On the whole  it is used quite consistently. However, the range of values in the database is wider than the closed set that is documented in the wiki. There are also several areas where related topics, such as a change of use, or the current status of an abandoned structure are recorded inconsistently.

Although the "historic" tag is of particular interest to me at the moment, it is just one example of similar issues that occur elsewhere. As the scope of OSM content expands, and the mix of contributors widens, it seems to me that there are likely to be a growing number of areas where consistency might better be maintained by communicating general principles, rather than trying to police usage of a limited number of options.

To my eye the current documentation seems to be heavily reliant on saying "you can only this tag with these values, and they have to mean this". That's a sensible model for "highway", but it's not so good for "historic". There are discussions on tagging all over the place, but only the most determined will be prepared to plough through it all, and the core documentation on the wiki is a bit thin on "here is a good way to express what you want to say", "most people have resolved this kind of problem in this way", "there are several different ways of doing this, but no agreement yet on which is best", "there has been a proposal on how to define this, but it has not yet been widely adopted". That kind of guidance isn't terribly sensible for the  "highway" tag, but I think it sits comfortably with the way that "historic" (and several other tags) are being used at the present time.

I know I'm not the only one who finds some of the current approaches frustrating, but I'm not sure how best to help work towards a solution. Too much of the discussion of contentious issues is at the level of the playground, and some of the platforms where discussion is supposed to take place have become a joke. So what I've done instead is to start with a specific suggestion by proposing some changes to the way the "historic" tag is documented on the wiki - here.

Basically what this tries to suggest is a move towards providing guidelines on how to document historic subjects rather than (or in addition to..)  providing a catalogue of values with a limited number of options. It's as much about the style of approach as the specific content.

Maybe I'm trying to address a problem that only exists in my imagination. On the other hand, it may be a widely recognised problem, but this solution is seen as inappropriate. Or maybe there's a better alternative. Or (who knows) perhaps this is a reasonable start that just needs some improving. I would welcome comments. If I am somewhere in the right area, then it's important not only to get feedback from those determined souls who try to police the wiki, but also from the more occasional OSM users who might just want to use it (such as those who sometimes arrive here). If you don't have, and don't want, a wiki account, please feel free to leave a comment below.