Saturday 15 November 2014

Bandstands

Bandstands are structures with a wide appeal, and the way they have been recorded in OSM throws a light on how contributors approach features that fall outside the mainstream. The OSM data on bandstands isn't complete, but the database may already contain one of the most comprehensive lists of existing bandstands in the UK.

The most extensive list of UK bandstands that I have found is the list of Vintage Bandstands, here. This has 334 distinct entries, but I don't think they all still exist.

In 2001 the Urban Parks Forum surveyed local authority parks in the UK. Out of the 438 bandstands they could identify 203 had already been lost, 186 were in use, or under repair, and 49 were abandoned or unused. Since then the Heritage Lottery Fund has been investing in the restoration and rebuilding of bandstands. I can't find more recent data, so for now let's assume that there are still more than 200 bandstands in public parks in the UK.

Bandstand in Sefton park, from Wikimedia

Across Britain, there are about 142 bandstands that are listed for architectural or historic interest. I have locations for those in England, and roughly three-quarters of them lie inside public parks, and roughly a quarter outside public parks. Around the same proportion of the bandstands in OSM are within a park, so it is probably fair to assume that by concentrating on parks, the survey by the Urban Parks Forum was primarily concerned with about three-quarters of the bandstands in the UK.

So as a starting point I'm going to assume that there are about 275 bandstands left in Britain, of which about 213 are inside public parks, and 62 outside public parks - fewer than on the Vintage Bandstands list, and more than the Urban Parks forum suggests.

I've managed to find 223 bandstands in the OSM data for Britain, which is about 80% of my estimated total. If these features really are bandstands, then that's an impressive result for a feature that I thought would fall well outside the mainstream.

Our ancestors obviously knew how to build structures that would maintain their appeal, and that's part of the attraction of examining data on bandstands. But I also wanted to look closer at the data because the tagging of bandstands in OSM is particularly inconsistent, and I thought we might learn something from that.

A fairly simple search of the OSM database for anything that mentions "bandstand" (and spelling variations) will pick up 247 features. On inspecting the data we find about dozen of these are false positives: completely different features with the word "bandstand" in the name. Ten appear to be duplicates. If two different features within a few hundred yards of each other both describe a bandstand then they are probably referring to the same structure in the real world. Some of the duplicates occur because contributors have added and tagged both a way and a node. Some may be because one attempt hasn't rendered and a later contributor thought the feature was missing. In a few cases contributors seem to have been uncertain how to tag the feature, so they have added more than one option.

About 50 of the bandstands in the OSM database correspond to the 93 listed bandstands in England, so contributors have added more than half of the bandstands that are listed by English Heritage. Very few have been marked as a structure with listed building protection. If the overall totals are correct, then contributors have located more than 80% of all bandstands, but less than 60% of listed bandstands. I'm not sure what to make of that.

For processing this data we really want to find bandstands based on well-defined attributes. Interestingly, with bandstands we find more variation in the choice of keys than in the choice of values. The most common contents of the value, by far, is "bandstand", with "band_stand" accounting for about one in thirty values. The keys that have been marked as a bandstand include "leisure", "amenity", "building", "historic", "type", "building:use", "tourism", "shelter_type", and "man_made". It's interesting that bandstand contributors choose quite a wide variety of different keys with quite a narrow range of values.

There are 233 features that are recognisable as a bandstand from the data content. This includes 10 that appear to be duplicated, which mucks the numbers up slightly. I've slightly fudged this in the chart - to keep things simple.



The tag "leisure=bandstand" is recommended in the documentation, and accounts for about 24% of all UK bandstands (30% of the ones I found, and almost half of the bandstands that have been coded in a structured way). The "amenity" and "building" tag with a value of bandstand account for another 18% of all bandstands. Less common tags such as "historic", "type", "building:use", "tourism", "shelter_type", and "man_made" account for another 2%. Low usage of "man_made" surprised me, because I thought this would have been seen as more appropriate than "building". Apparently not.

More than a third of bandstands can be identified in the database by the name, but not by other coding. This approach, of course is risky for automated processing, because the results need to be checked for false positives. However, for some purposes it is still worth looking at how bandstands are named. The form "name=bandstand" is the most common - almost as though the name tag is being used for coding, since this is not capitalised. Less common forms are "name=Band Stand", "name=The Bandstand" and "name=The Band Stand". I suspect contributors have been influenced here by labeling in the standard render.  Together, these four values pick up 84% of the named bandstands. The rest mainly use names based on the location - such as "Southsea Bandstand".

The few remaining examples that I found are a mix of more obscure tagging, and spelling variations. These are of little interest, or value to data users.

There are probably about 50 UK bandstands that don't appear in the database (yet).

We have to be wary of false synonyms. I've looked at features in the database that correspond to the location of listed bandstands, and along with tagging variations in the OSM data these suggest that some contributors consider "gazebo" and "pavilion" as synonyms for "bandstand". Some just label the feature as a "shelter". A "gazebo" can look similar to a bandstand, although a bandstand is generally larger, raised higher above ground level, and clearly intended for a different purpose. The term "pavilion" might be acceptable as a technical description of the architecture, but in general data users will not be able to use it because it is so widely used to mark a sports pavilion. And "shelter" is a very general category, that doesn't help somebody who wants to identify bandstands.

Now to draw some conclusions.

Although we know for certain that some are missing, there's a case that the OSM database already contains one of the most complete lists of existing bandstands in the UK. With relatively little effort to plug the gaps, and verify existing data this is information that could be used productively, by anyone who wants to do so.

If somebody wants to find bandstands in the database at present they will probably look for values of "leisure", "amenity" or "building" that contain "bandstand". That will uncover almost half of all examples in the database. It starts to get quite complicated for data users to seek out tagging variations and find the next few percent. Searching the "name" tag for variants of "bandstand" will turn up quite a few likely candidates, and might be appropriate in some circumstances,but it would not be reliable enough for systematic processing. Without manual inspection this approach also picks up theatres, cafes, and the like that have been named this way.

The data on bandstands is fairly comprehensive and suggests that contributors can have quite different perspectives on these features. So this is an area where we could (and probably should), encourage more consistency, while tolerating quite a lot of variation in tagging. For the relatively small number of features in the database, bandstands demonstrate an unusually varied use of tags. Whether they realise it or not, different contributors have been marking bandstands according to their function ("leisure=bandstand" or "amenity=bandstand"), according to their form ("building=bandstand", "man_made=bandstand"), or according to their significance ("historic=bandstand"). These are surely all valid approaches, and they are not incompatible with each other. Indeed, quite often they are used together on the same feature. It is quite conceivable that one bandstand will be notable for its historic significance, but no longer in use as an amenity, while another might be in regular use as a leisure facility, but have no historic significance. Some might have changed function ("shelter:type=bandstand"). We don't know how this data might be used in future, so differences such as these ought to be reflected somehow in the tagging. However, the current documentation doesn't give guidance on such subtleties, so the current data probably doesn't record them accurately. All we can really say for now is that features with any of these tags have been recognised and recorded as bandstands.

If the community wants to improve the current data on bandstands then the following might be priorities :

  • locate the fifty or so missing examples
  • add structured tagging to the hundred or so bandstands which can currently only be found by name
  • fix inconsistencies - such as combining duplicates
  • encourage considered use of the existing tagging options to capture and retain information that can be collected in the field
  • add attributes (such as listed building status) that could be of interest or value to data users
  • find a group of bandstand enthusiasts who might be interested in verifying the data, finding innovative uses for it, and taking things further
  • celebrate with an outdoor concert

Tuesday 11 November 2014

Kennels and catteries

Kennels and catteries are commercial businesses where owners can leave their cats or dogs while they are away from home. They are part of a wider category of animal boarding that also includes donkey sanctuaries, and organisations that take in domestic strays, pets whose owners are no longer able to cope; and wildlife, such as hedgehogs. These are more likely to be run by charities.

In England and Wales animal boarding establishments (including kennels & catteries) are controlled by the Animal Boarding Establishments Act 1963. They have to be licensed by the local authority. The situation is similar in Scotland, but controlled by a different act.

Because they have to be licensed I thought it would be straightforward to find statistics on how many kennels and catteries there are in the UK. I was mistaken. Published government figures on the business population don't go down to that level of detail, local authorities don't seem to publish any statistics, and I can't find figures from trade bodies. However, the Valuation Office Agency does publish figures for different types of property, including the numbers of kennels and catteries there are in Wales and the English regions. These figures are a bit old (2010), but broadly in line with the numbers that come up on a search of Yellow Pages. Unless anyone can come up with a better figure, I think we can be fairly confident that there are just short of 5,000 kennels and catteries in the UK.

Of these I've been able to find just over 200 in the OSM database. Contributors have tagged roughly half of those as a kennel or cattery, and the other half can be identified (with reasonable confidence) as a kennel or cattery by the name.



Mapping these clearly hasn't been a high priority for OSM contributors.

I reckon that the data on kennels and catteries is too incomplete, and the tagging is too inconsistent for it to be of great practical use for rendering or other forms of data presentation (at present). The point of this post isn't to argue that things should be any different. These establishments are not particularly prominent features in the landscape. Dog and cat owners (in my experience, at least) will either have chosen their a preferred animal boarding service already, or they will find one through personal recommendation rather than searching a database. This isn't quite the same for animal rescue, where I could see a need for an application to find the nearest hedgehog sanctuary (for example). But it's hard to see how pressure from data users is going to create a surge of interest in data on animal boarding. One day we may see enthusiasts kick of an "Animal Boarding Mapping Project". But my guess is that these are more likely to be added by non-specialist contributors who are working on mapping a wide range of different features within their local area.

In any case, the community will decide on priorities. My interest in looking at this is not to push for action on kennels and catteries. I'm more interested in seeing what we can learn about how contributors approach less commonplace features.

We find three different models for tagging kennels and catteries.
  • The usual approach is not to label these as a kennel or cattery at all. Examples of kennels and catteries can be  picked up fairly easily through the contents of the "name" tag. There may be other, similar techniques that I haven't tried. In other words these features have been mapped, and named, but there is no further detail to indicate that they might be of special interest. In the chart they appear as "Name only"
  • The second most common approach is to use simple "amenity" tagging. This makes use of user-specified values of the "amenity" tag: "amenity=kennels" and "amenity=cattery". These aren't documented, but nevertheless, they represent more than half of the examples of kennels and and catteries that have appropriate tags attached in the database. In the chart these appear as "Simple"
  • The third approach is more structured, and follows the tagging recommended in the documentation. This is based around "amenity=animal_boarding"  with more detail added under "animal_boarding=...". This approach represents almost half of the examples of kennels and catteries that carry specific tagging. Although it is the documented approach it is not yet the most widely used. It appears in the chart as "Structured".
As always, there are a few variants on both of the structured models. By the look of it most of these are a mix of typing mistakes, and misunderstandings. They don't make a significant difference to the totals. There are an even smaller number of intriguing examples, though, where a contributor has used tagging based on "pet=". There are only a few of these, and it looks as though this might be an experiment that didn't go further, but it suggests that at least one contributor sees boarding kennels and catteries in terms of how they relate to other facilities for pets, rather than as a type of amenity, or as a service for animals.

The blindingly obvious conclusion is that there are serious limitations in the current data. Inconsistent tagging might be a deterrent for users who want to render or process this data: but the real blocker is the very limited coverage.

In the case of kennels and catteries, it seems likely, as things stand, that anyone using this data is unlikely to be interested in rendering or presenting the data within an application.Simply because there isn't (yet) enough coverage to make this viable.

There is no shortage of similar examples in the database, outside the mainstream,  where coverage is low.

To my mind this raises some interesting questions.

When the community is debating about best how to tag features that fall outside the mainstream:

  • Who is using this data, and what are they using it for?
  • Do current approaches meet the needs of data users, can they be improved, and if so, how?
  • Could contributors be encouraged to add more useful data?

Presumably we think that one day coverage of these features will become sufficient to make rendering or application processing viable. If we don't think that, then why are we collecting this data at all?.

  • Will the needs of data users change at that point?
  • If needs do change, how will that affect the data?
  • How will we tell when we have reached that stage?
Obviously I wouldn't be raising these questions if I held the view that the needs of data users were the same for all types of feature, and that they were unchanging as the contents of the database evolves. 

To me, the key question is "how do we tell when we have reached the stage that rendering or data presentation becomes viable?". But this might touch on some contentious issues. It's probably best to stop at that point, and see what others think.

Saturday 8 November 2014

Bookies

The Gambling Commission publishes statistics on the number of bookmakers in Britain. Earlier this year they recorded 9,021 bookmaker premises, based on returns from operators. The number has been fairly constant in recent years.

A data user will be able to find about 1,280 of these in OSM (depending on how determined they are).




The preferred tag is "shop=bookmaker". That picks up 8% of all bookmakers in Britain. The most common alternative is "shop=betting" and that will pick up about 4% of the total.

There are a number of variants on these (shop=bet, bookmakers, betting_shop, bookies, turf_accountant) Together these pick up about 0.4% of the total.

A few contributors have gone down slightly different routes. Using the "amenity" tag rather than the "shop" tag accounts for another 0.2%, and various bookmaker-related values for "gambling" account for another 0.4%.

There are also some premises which look like a bookmaker (based on the operator or the name), but aren't marked as such. I found another 79 premises that might have added to the collection if they had been tagged with additional data. There are almost certainly more that I missed, but deep searches to find these get increasingly complicated and the results increasingly suspect.

I'm unable to find around 86% of British bookmakers in the OSM data.

Hotels

VisitEngland is the tourist board for England. It publishes statistics on the stock of different types of accommodation, broken down by local authority. These figures group all serviced accommodation together. So hotels, guest houses and B&Bs are all counted together, but non-serviced accommodation, such as camp-sites and self-catering holiday accommodation is counted separately. Importantly, they count establishments rather than businesses.

Comparing their figures with OSM data for hotels, guest-houses, hostels and B&Bs suggest that the map currently contains about 7,500 out of 32,000 of these establishments in England (i.e. 23%).

Roughly 15% are tagged as a hotel, roughly 6% as a guest house, 1% as a hostel, and under 1% as a B&B. There are also a small number of motels and some pubs with accommodation that are conventionally tagged. As usual, there is a smattering of weird spellings, and a few  misunderstandings of tagging conventions, but these only account for a couple of dozen entries. They don't make a significant difference to the totals. More than 75% of establishments that provide serviced accommodation are missing from the data (or at least not easily found).




VisitEngland count all hotels together, and I've not been able to find a reliable-looking and detailed breakdown of different types of serviced accommodation. Smaller establishments seem to account for less than a third of the data, but I would think that they account for a lot more than a third of all establishments. Their trade association suggests there are about 25,000 guest houses and B&Bs in Britain. OSM contributors have located about 14% of that number (note - this figure relates to Britain, others to England). It's possible that some of these have been tagged as a hotel rather than a guest house or B&B, - but on the face of it, smaller establishments appear to be the most under-represented.

Looking at the distribution by local authority, it seems that either VisitEngland is particularly sloppy at counting hotels in the Midlands, or OSM contributors are particularly diligent at mapping them. For the rest of us, this might be a good time of year to fill some gaps before visitors and proprietors start gearing up for the 2015 season.

If anyone is planning to map serviced accommodation they will probably want to pick up tourism=hotel, guest_house, hostel, bed_and_breakfast or motel and amenity=pub with accomodation=yes.

Saturday 1 November 2014

QGIS 2.6

I have just downloaded the latest version of QGIS, and I'm using it as an excuse to play around combining different sources of data, Here is a mix of Bing Aerial imagery, hill-shading based on OS Opendata elevations (resampled) and OSM geometry for roads, buildings and water. Why not? It almost works.