Wednesday, 22 July 2015

OSM Retail Survey: Part-4

It is often useful to raise our sights from the data that has been recorded in OSM, and consider the data that hasn't been recorded. As discussed above, the overall level of retail coverage in England is 27% of retail premises. However, there are variations in the extent to which different types of shop are recorded.

There are a number of ways to assess OSM coverage of a specific sector at a national level. The approach, broadly, is to count the number of shops of a particular type that are already recorded in OSM, estimate the total number that should be there, and compare the two.

There are various sources of statistics that give basic information on numbers of different types of shop at a national level. Getting useful estimates of numbers at a more local level is a bigger challenge.


  • It isn't difficult to find information on the number of branches for major retail chains - through their own publications, from business reporting, or from Wikipedia. The OSM Wiki has a detailed page on major UK Retail chains. 
  • There are national statistics which can be used for some sectors and types of shop (for example, see UKBA01a Enterprise/local units by 4 Digit SIC and UK Regions; and Retail Hereditaments by Administrative Area issued by the Valuation Office Agency). 
  • For independent specialists, many trade associations publish figures on the size of their sector.
  • Press articles and market research companies will sometimes publish figures on the number of different types of retailer. 
  • When all else fails, searching a directory  (such as Yellow Pages) can give some idea of the likely number of outlets. 
  • Where retailers need to be licensed (tattoo parlours, for example), I thought it would be easy to obtain figures at a local authority level. No doubt this would be possible (through FOIA, for example), but so far I haven't found a more accessible source of licensing statistics. Local figures may be easier to find from the licensing authority, and any ideas would be welcome on where to find national figure.

For specific retail locations:


  • within the OSM community, Robert Whittaker has tools relating to Post Offices on his Post Hoc pages (http://robert.mathmos.net/osm/postboxes/). 
  • Beyond the community, most large retail chains publish a list of branches, and some have given permission for this information to be added to OSM. 
  • Trade associations for specialist independent retailers don't normally seem to provide information on the location of individual members, but some may,
  • The NHS provides lists of pharmacies and opticians.

Using this approach, and taking three examples where we might expect levels of recording to be relatively high:

  • There are 11,696 post offices in the UK, and I have found 7,622 (65%) of them in OSM. Around 90% of built-up areas with a population of more than 5,000 have a post office in OSM. The 72 that don't might be good places to find missing post offices. More generally, post offices can be used as an indicator of a wider gap in coverage. They are one of the types of retail outlet that are likely to be added before other retail properties. So a larger settlement with a missing post office is likely to contain other retail properties that need to be added. 
  • There are 11,647 community pharmacies in England. I found 4,225 tagged as a pharmacy (36% of the total), and another 483 tagged as “chemist”. Strictly speaking “chemist” is for shops that don't supply prescriptions, but has been quite widely used as a synonym for “pharmacy”. Taken together we locate about 40% of community pharmacies. More than half are missing. I imagine that towns the size of Braintree, Grantham, Peterlee, Melton Mowbray, Haverhill, Maghull, and Congleton have a community pharmacy – but none seems to be recorded in OSM (they are displayed in Google location searches). Only around half of the pharmacies in England are specialised shops – the rest are an operation embedded within another store. Community pharmacies that operate from within a large supermarket seem to be under-recorded in OSM. 
  • There are about 2,500 specialist bicycle shops in the UK, and I have found 1,631 (65%) in the OSM database. The largest bicycle retailer, Halfords, has 465 branches across the UK, of which I found 364 (78%). I'm not sure how many of those offer bicycles, but OSM says that 151 of them do (41%). That must surely under-state the true figure.

And some examples where I expected coverage to be relatively low:

  • Figures suggest that there are about 1,400 pound stores in the UK. I've found 586 tagged with “variety_store”, and another 170 or so with alternative tagging. Which means that tagging is inconsistent, but suggests that coverage is over 50% - i.e. more than I expected. Perhaps my estimate of the total is too low
  • There are 8,500 Charity Shops in England, 900 in Scotland and 500 in Wales. I should find 9,900 in my data extract. Depending on how carefully I interpret the data, I can find between 1,751 and 1,994 (18-20%). Around 90% are tagged as “shop=charity” but there is a smattering of others tagged according to their specialisation: “shop=clothes”, “shop=secondhand” or “shop=books”



Primary
tag value
OSM count
(UK)
OSM count
(England)
Estimated 
 actual (UK)
Estimated 
 actual (England)
Approx. coverage
pub
34,937
31,180
48,000

73%
restaurant
16,062
13,855
60,000

27%
fast_food
15,762
13,794

41,295
33%
cafe
13,137
11,280
16,501

80%
convenience
13,108
11,212
48,303

27%
supermarket
8,720
7,352
6,410

119%
post_office
7,622
6,199
11,696

65%
hairdresser
7,187
6,366
38,300

19%
fuel
6,207
5,190
8,588

72%
bank
5,946
5,089
8,961

66%
pharmacy
4,871
4,225

11,647
36%
charity
1,682
1,476
9,900
8,500
17%
bicycle
1,631
1,406
2,500

65%
beauty
1,543
1,361
13,000

12%
bookmaker
1,386
1,242
9,128

15%
optician
1,161
1,041
7,250

16%
florist
983
885
8,000

12%
alcohol
784
665
5,575
4,195
14%
variety_store
586
528
1,400

42%
deli
456
391
2,500

18%
seafood
87
70
950

9%



It is interesting to consider in more detail at how data users might interpret some specific examples.

Finding a pharmacist (i.e. someone who can dispense prescriptions) could be the basis of a useful application, and there have been various attempts to develop appropriate tagging, but the actual data is quite complex for data users to interpret.

Values of “pharmacy” and “chemist” can appear for “amenity” and “shop”; “dispensing” can be set to “yes”, “no” or sometimes the name of the outlet. And all of these can be combined in different ways, alongside other values of “amenity” and “shop”.

  • “amenity=pharmacy” alongside any value of “shop=*” and either “dispensing=yes” or no value for “dispensing”: this is in line with the various guidelines, and unambiguously indicates that prescriptions will be dispensed. This accounts for almost 90% of cases in the data
  • “shop=chemist” without any indication of “dispensing”: is correct tagging for a place where prescriptions will NOT be dispensed, but examining actual examples suggests that it is widely mis-used for pharmacies. So in practice it has to be regarded as ambiguous. It represents almost 10% of cases.
  • “amenity=pharmacy” with “dispensing=no”: is inconsistent tagging, and not in line with the guidelines, but can still be interpreted fairly confidently as a place where prescriptions will NOT be dispensed. It accounts for around 1% of cases
  • “shop=chemist” without “amenity=pharmacy”, and with “dispensing=no” is correct tagging, and unambiguously a place where prescriptions will NOT be dispensed. It only accounts for 0.1% of cases.
  • “shop=pharmacy”, “amenity=chemist”, with or without other values are examples of incorrect use of the tags, but small in volume (less than 0.5%), and often appear alongside a correct tag (e.g. “shop=pharmacy”+“amenity=pharmacy”): the incorrect tag values can safely be ignored by data users without sacrificing significant amounts of relevant data

The above figures are calculated from pharmacies and chemists recorded in the database. So it is worth recalling that this only accounts for 40% of actual pharmacies, and around 60% of these outlets do not appear in the database at all.

In practice data users are going to be reasonably confident that they have found a dispensing pharmacist where “amenity=pharmacy” is present, and “dispensing” is either absent, or set to anything other than “no”. They will have to treat “shop=chemist” as ambiguous in this context. In practice they will probably ignore everything else because the complexity of the logic increases out of all proportion to the quantity of reliable data that it can uncover. In summary they will confidently interpret 90% of the data in the database, and find just over one in three pharmacies. If they interpret the data more loosely they will be able to point their users to about 40% of real pharmacies. If they want to find more, then at present they will have to look elsewhere for their data.

Next we will look at how coverage by type of retail outlet might be used to provide useful feedback to contributors and data users.....

No comments: