Friday 31 July 2015

OSM Retail Survey: Part-12

I see that most readers of these posts come from outside the UK. I hope the information is of some interest and use to others, but I should probably emphasise that this is just a survey of UK retail data on OSM. The structure of the retail industry differs from place to place. The UK, for example, has a relatively high proportion of retail turnover through large retail chains: i.e. fewer small retailers than France, and fewer mid-sized retailers than Germany. It would be interesting to know whether similar patterns of OSM data are found elsewhere, or whether there are differences that we can all learn from.

After the most basic information on the location and type of shop, the most common (and for many data users, the most useful) supplementary information in OSM is the name of the retailer. Some form of name is provided for around 90% of retail premises, but the proportion is lower for certain types of outlet. Most UK Post Offices, for example, do not have a name tag.

Names will probably be most useful to the data user because they allow searches for a specific retailer. In the UK, understanding the name of the chain may carry relatively high value, because the name often gives a clear idea of the format and range of goods on sale (most people have an idea of what to expect in a branch of Argos, even if the tagging of shop type is inconsistent).

Names are also a useful characteristic when analysing retail data. Where chains have multiple similar branches, this provides a vehicle for checking tagging consistency across a chain, and information can be extracted on OSM coverage of the larger chains.

There are numerous variations in the way names are provided by contributors. These fall into several groups:

  • Variation in capitalisation: Aldi / ALDI,  Asda / ASDA, Spar / SPAR, Lidl / LIDL, Pets at Home / Pets At Home
  • Variations in the use of abbreviations: Co-op, Co-Op / Co-operative, M&S / Marks & Spencer / Marks and Spencer, Toni&Guy / Toni & Guy / Toni and Guy, B&Q / B & Q / B and Q
  • Variations in the use of a possessive, with or without apostrophe: Sainsbury / Sainsbury's / Sainsburys, Wilkinson / Wilkinsons / Wilkinson's, McColl / McColl's, Jewson / Jewsons, Maplin / Maplins
  • Variations in the use of a definite article: Co-operative Food / The Co-operative Food, Money Shop / The Money Shop, Body Shop / The Body Shop, Carphone Warehouse / The Carphone Warehouse
  • Variations in spacing: WHSmith / WH Smith / W H Smith, Kwik Fit / Kwik-Fit, One Stop / One-Stop / OneStop, Phones 4U / Phones4U, TK Maxx / TKMaxx 
  • Including or omitting the location of the branch in the name: “Tesco, Swansea Marina”, “Batley Tesco”, “Tesco Barnstaple”, “Tesco Extra, Hartlepool”, “Tesco - Biggin Hill Express”, “Tesco (Greenock)”, “Tesco Rugeley Superstore”, “Tesco Ystrad Mynach”, “Tesco Extra (24hr) Maldon”, “Tesco South Tottenham Superstore”, and many, many more.

Different names for different formats within a chain are a slightly different issue. Tesco, Tesco Metro, Tesco Express and Tesco Extra can all be considered valid names for different retail formats used by Tesco. Similarly, Sainsbury's and Sainsbury's Local. These cannot really be considered variations in the same name. There is evidence, though that they are not used consistently, so “Tesco” is sometimes used as a synonym for the various different formats (and at other times it isn't).

It is difficult to put a firm number on how many times all these different kinds of inconsistency occur,  but it certainly runs into thousands (i.e. between 1% and 10% of mapped shops). So they are likely to cause some data users a degree of frustration. However,  similar patterns recur, and some names are particularly prone to certain variations. With varying degrees of effort data users who place a high value on standardised names will probably be able to find ways to work round many of the differences that are most important to them.

Occasionally the name will appear as the value in the "shop=" tag, but this is very rare. Various other tags are used to hold different types of name: primarily “name”, “operator”, and “brand”.  In around 4% of cases the contributor has provided a name, plus details on the operator, brand, or both, and in less than 2% of cases they have provided information on the operator, or brand, but not the name of the outlet. There is some inconsistency in the way that “operator” and “brand” are used, which will make life a bit difficult for data users.

Data users who ignore “operator” and “brand”, and use just the “name” field will lose information from around 6% of recorded retail outlets. However, there are certain sub-sectors where the operator or brand tag are more widely used.

In the case of petrol stations, for example, both are used, but both the brand of fuel and the operator are scattered quite widely across the name, operator and brand tags. For car showrooms contributors normally use the name tag to show the name of the outlet, but it is not uncommon for the name tag to show the car manufacturer. In around 11% of cases the manufacturer appears under brand, but in some cases it appears under operator. This inconsistency in the use of names could be a lost opportunity to provide additional information to data users, but it is difficult to know how much of a problem it will really give them. Anyone who wants to search for a Volkswagen Dealer, for example, will need to search for “VW” or “Volkswagen”. If they are capable of doing that on the name tag, then they will be able to search the name, brand and operator tags almost as easily. If they go to this amount of effort, they will currently pick up around 30 VW dealerships in the UK. This is well short of the true figure of around 200. Again, missing data is a bigger problem here than inconsistent tagging. Sensible end-users who want to find a VW dealer will use the manufacturer's dealer search rather than OSM data.

For most types of retail there is no major issue with misuse of the “brand” tag. On the whole it is used consistently for car dealerships and filling stations, and little used elsewhere. There is, though, some confusion around tagging of convenience store names. In the UK these are often independently owned and operated, but trading under a well-known national franchise. Examples include SPAR, Londis, Costcutter, Premier Stores and Nisa. Together these represent around a third of the convenience store sector in the UK, so they have a significant presence, but none of the various combinations of “name”, “operator” and “brand” really captures the business model. As a result the way they are tagged is quite inconsistent (roughly 80% of the time the franchise is tagged as “name”, roughly 13% as “operator”, and 6% as “brand”).

Address details are attached to about a third of retail properties in OSM, but only 15% have a postcode, and the proportion with a complete street address is in the region of 5-10% (depending on what components a user requires in order to regard the address as complete).

Contact details (web site, or phone) are provided for around 10% of retail properties, although for certain types the proportion is higher. For restaurants and bars, for example, the proportion with contact details is around 20%, and for estate agents around 25%. Is is more common to find a web site than a phone number, and comparatively rare to find both. In the case of restaurants, for example, 15% are tagged with a web site, 10% with a phone number, 5% with both, and 80% with neither. Restaurants have a higher level of coverage for contact details than most retail sectors on OSM, and yet, out of 60,000 restaurants in the UK, OSM has contact details for just over 2,000. This looks long way from being a set of information that is viable enough to attract data users.

Information on accessibility (“wheelchair=”) is provided for around 4% of retail properties, with a higher proportion for cafés and restaurants (7%) and for supermarkets (8%). Where wheelchair information is provided the value is “yes” in 63% of cases, but “no” in 20% of cases, and “limited” in 16% of cases.

Information on opening hours is provided for under 3% of retail properties. The types of outlet that fare better than average for this information are an odd mixture: Supermarkets (but not Convenience Stores), Bars (but not Pubs), Pharmacies (but not Post Offices) and Bicycle Shops. Cafés and Restaurants rate only slightly higher than average.

It's worth looking more closely at an example where the opening hours have been provided more often than average, and where they could play an important role in any feasible application. If I wanted to find a pharmacy near to home, that is open on a Sunday afternoon, then the nearest pharmacy where I can check “opening_hours” on OSM is nearly 50km away (and as it happen, it isn't open on a Sunday afternoon). There are four pharmacists within 100km that OSM tells me are open on a Sunday afternoon, but only one of them shows a phone number. This is potentially an application area where the tagging structure is ready to support a viable application. In London, and a few other towns and cities contributors have been diligent in adding sufficient detail to make a search viable (Stoke on Trent, Norwich,...).  But across most of the country, it doesn't look to me as though the data content is anywhere near ready to attract users to this type of data.

In summary, OSM has the potential to support more sophisticated searches of retail data than a simple location search, in the sense that data structures are in place to hold much valuable information. Where these are used, they are used fairly consistently. However, in most cases coverage of supplementary data is an issue. There is a considerable way to go before the supplementary data is sufficiently complete and consistent. The first priority is probably to work towards more consistent naming. Beyond that, at present, the potential applications for this data are hypothetical, and it is too early for an informed debate on priorities.

No comments: