Tuesday, 24 June 2014

More on rivers

I've been looking into couple of questions about the OSM data on rivers.

The straightforward point is about naming of rivers. The guidance is to use the complete name on the waterway=river line. So "River Tyne" rather than "Tyne". However this guidance isn't followed consistently. About 12% of UK rivers (waterway=river) haven't been tagged with a name at all, roughly half are tagged with a name in the form "River xxxx", and rest are tagged with a different form of name. A lot of the time that doesn't matter, and in any case some rivers have widely recognised names that don't include the "River..." prefix. To label "Afon Teifi" as "River Afon Teifi" would just be silly.

The bigger challenge is that there isn't consistency within naming of different segments of the same river. Rivers are long, and normally mapped in pieces. There isn't a widely adopted reference system for rivers, as far as I can see. So (alongside the geometry) names are the best way of associating different pieces of the same river.

Most of the larger UK rivers have segments tagged with a mix of different forms of the same name - including the River Avon (8% of length = "Avon"), River Thames (1% = "Thames"), River Derwent (11% = "Derwent"), River Trent (4% = "Trent") and River Don (6% = "Don"). Tagging with different forms of name gets in the way of other processing, so I try to get round this by standardising on a stripped form of name (i.e. removing "^River " with a regular expression). This doesn't work all the time, but for most purposes it works well enough.

The next issue is a side-effect of mapping rivers as a mix of areas and lines. The map below shows the point just west of Hexham where the North Tyne and the South Tyne come together. Parts have been mapped as areas (waterway=riverbank) and parts have been mapped as lines (waterway=river). There's a gap between two of the areas that were drawn as riverbanks.

These look particularly odd here, because I've emphasised the problem by removing all but the river. In the default map the issues don't show up to the same extent. Bridges mask two of the three problems, and while the the third is visible on the standard map, it tends to get lost amid other details.

But the default map isn't the only way this data is used. I'm interested in exploring the limits of using the same data for detailed maps, so I've been pondering on how best to handle this mix of lines and areas.

Of course the long-term answer is to add more detailed mapping. I'm doing that where I can, which tends to shift the problem further upstream. Meanwhile, in the shot-term, varying the width at which I render waterway=river lines provides a partial solution. The issue is that different widths work for different parts of the river system. For example, further north, tributaries of the River Tweed have been mapped in quite a lot of detail, and at the point where the areas meet the lines the rivers tend to be narrower. Here's an example where exactly the same style for waterway=river works well enough when it is mixed with areas tagged waterway=riverbank.

Choosing a suitable width for the waterway=river line can look OK. A width that is too narrow will leave step changes like my first example. Unless I start chopping up overlapping lines and areas, then a width for the waterway=river line that is too large will obscure more detailed mapping of riverbanks.

Although there is a tagging scheme for explicitly recording the width of a river (width=n) it is rarely used in the UK, and very rare around here: so it is not a great deal of help to me in practice. Usage is patchy though, so there are places where others may find it more useful (see darker lines below for some idea of where width has been applied to waterway=river). For what it's worth, where it is specified the average width given for a river is just under 5 metres.

In theory it might be possible to estimate a different width for each river segment by analysing adjoining lines and any overlapping areas of riverbank. However, I suspect this would turn out to be too complicated to be of any practical use.

While I wait for more complete mapping, what I think I need is a reasonably sensible default river width. River widths vary, so there are always going to be cases where a single default is either too wide, or too narrow. But because I want to retain mapping of detailed data wherever possible, I would prefer to err on the side of choosing a default width that is more often too narrow, rather than one which is more often too wide.

As a rule of thumb, I reckon that a width equivalent to 9 metres on the ground works quite well around here, particularly with rivers that have been mapped in a fair amount of detail. Where it doesn't work so well there is an incentive to improve the mapping. As far as I can tell this is also about the right default width for rivers in the rest of the UK.

In more than 90% of cases where the width of a river is specified it is less than 10 metres. I'm not doing any special processing for such cases, because they don't occur around here, but there may be a case for handling river width where it is specified.

I don't have any evidence for the right thresholds for width elsewhere in the world, but consider this. Whether by accident or design, in Northern European latitudes, nine metres seems to be roughly the width that a default waterway=river line represents in the standard map. So on the standard map a mix of river lines and areas happens to turn out reasonably well at the point where a river is nine metres wide. I imagine that contributors who are influenced by the standard map will tend to stop drawing rivers as areas once the river width reduces to the equivalent of the default line width on the standard map. They may not realise they are doing this, but if they notice that riverbank areas don't add value to narrower rivers then this would be a sensible point for them to stop adding areas. If this is a global effect, then perhaps there could be some appropriate guidelines for data contributors.

As supporting evidence, I've sampled about 130,000 points along UK riverbanks from the OSM database. The most common width between banks is 9-10 metres. This is well below the average width, or even the median, because there are many sections of river which are wider. But more than 10% of my sample showed a river width of 7-11 metres. The number of samples falls off quite slowly for wider sections of river, but quite quickly where a river area is narrower. Only 2% of my sample showed a river width less than 5 metres.

I haven't done a similar measurement of mapped width for streams. There are some streams mapped as areas, so it should be possible, but I have doubts about how accurate, so how useful the result could be.

The standard definition of the difference between a river and a stream is that a stream "can be jumped across by an active, able-bodied person". Intriguingly, the world record for a running long jump is just short of 9 metres. So if we wanted to be silly, we could argue that a default river width of 9 metres is consistent with one interpretation of the transition point between a river and a stream.

More realistically, several fitness measures suggest that the kind of distance a fit adult can achieve with a standing jump is in the region of 2-2.5 metres. I wouldn't try to jump a stream that wide. But the figure of 2.5 metres suits my purposes as a default width for a stream.

There are about 3,000 sections of stream with a specified width the the OSM data for the British Isles. The average width is just over 1 metre. About 85% of stream segments with a specified width are less than 2 metres, and almost half are less than 1 metre. So my chosen default is higher than the average, and higher than most specified stream widths. I may have set it too high. However, it seems only slightly wider than the standard map rendering of a stream. On larger scale maps I reckon that a stream shown as 2.5 metres wide is about the minimum that renders reasonably clearly, and there is scope to increase the width without streams becoming too dominant across a map of open countryside. It still leaves me with a bit of a big step between an effective minimum river width of 9 metres, and a stream width of 2.5 metres, but the quality of stream tracing looks like being more of an issue.

I'm going to leave further tuning of river widths as a problem for another day. However, with an eye on the long-term possibilities, it might help detailed mapping if contributors were encouraged to map rivers that are more than 10 metres wide as areas, and add a width tag to rivers that are less than 10 metres wide.  


Gregory said...

The issue of inconsistent river segments could be solved by a relation. http://wiki.openstreetmap.org/wiki/Relation:waterway

gom1 said...

That's a fair point Gregory. However, at present there are about 4,000 river segments grouped in this way in the UK, and about 17,0000 that aren't. It is still some way off being universally adopted.