Monday, 11 April 2011
The rebuild is going reasonably well. I've got better source data, and discovered and fixed some errors that crept in last time round. I've achieved some worthwhile performance improvements and brought the processing time down from a few days to a lot of hours. At the same time I'm tweaking how the result looks.
As part of the whole process I'm trying to find more effective ways of capturing different tagging variations. For example, to find cycling infrastructure I search for various combinations of "highway=cycleway", "cycleway=something", "bicycle=yes", "ncn, rcn, lcn, ncn_ref, rcn_ref, or lcn_ref=something" and "route=bicycle". I currently find about 44,000km of UK cycling infrastructure with those combinations. By way of comparison, Sustrans say that their national and regional network is 12,600 miles (i.e. just over 20,000km) in length. The rest of what I'm finding is made up of local networks, and cycling stuff that doesn't form part of a network.
I end up plotting this at 1/2" to the mile - so there isn't scope for a lot of complication in the way I plot the different options. At the moment I'm characterising all of the different types of cycle path as either "on-road" or "off-road". It turns out that I classify 53% as being on-road. The way I've set things up, that effectively includes primary, secondary, tertiary, unclassified, residential, and other roads that are marked with a cycle-related tag of some kind. I'm classifying the other 47% as being off-road. Of the network that I capture about 3% fails to give any indication of the type of road or path, and I've assumed all of those are off-road.
All of these figures are UK only (more or less - the bounding box overlaps some other countries slightly), and I pulled the data at the beginning of April.
Looking first at the on-road element of the cycle network, only 1% is on roads that mappers haven't classified yet (i.e. highway=road). 8% is on a primary road (plus another 3% on a trunk road). 12% is on a secondary road. The way that lesser roads are classified can be a bit arbitrary, but according to OSM mappers, almost half (46%) of the UK on-road cycle network is on an unclassified road, 20% on a tertiary road, and 8% on a residential road. That leaves 2% on service roads, and a very little bit on different types of link-road.
Elsewhere, six tags account for 93% of the off-road network. The most popular is "cycleway" (almost half of the off-road network). "Bridleway" accounts for 16%, "track" for 11% and "footway" for 10%. The only other common tags are "path" (4%) and "byway" (3%). Among the less common tags we find "pedestrian" and "unsurfaced". "Steps" occurs a couple of hundred times, but doesn't account for much of the network length. More rare tags include "construction", "proposed", "living_street", "crossing", and some combinations of the above.
For obvious reasons I can't measure how much of the cycling infrastructure I'm missing. There must be some, but I've not spotted any obvious gaps with my improved process. Trying to measure the "unknown unknowns" is still on my list of things-to-do when I work out how. Of the infrastructure that I've managed to find, something like 96% is tagged in a way that I can understand. The other 4% are the "known unknowns" - I know they are cycling infrastructure but I'm not clever enough to do much else with them.
Leaving the unknowns aside, at first I found that the variety of information about the rest was a bit daunting. There was a stage where I would spot an apparent gap in the network, and discover that it was tagged with a variation that I hadn't thought to include. I started leaning towards the school of thought that somebody needs to get a grip on standardising the tagging scheme, so that we don't clutter things up with too many variations.
Now that I've got myself a bit more organised, I'm much more relaxed about all that. There aren't too many variations in general use. The different tags that are used widely reflect different characteristics of the infrastructure design ("Highway=cycleway", or "Cycleway=something"), the way it can be used ("Bicycle=yes") or the network that connects the bits together ("NCN", "RCN", "LCN"). I'm much more inclined now to the view that the benefit of capturing that rich variety with flexible tagging is far more important than the inconvenience of having to handle some of the more obscure variations.
Posted by gom1 at 22:53