Sunday, 26 July 2015

OSM Retail Survey: Part: 10

Specialised types of shop offer a narrow range of categories, but provide wide choice within their specialist area. Generalist retailers offer a broad range of product categories, with less choice within each category. Large generalists (e.g. supermarkets) are able to offer both numerous categories and broad choice.

We use different terms for a large “supermarket” (with breadth and depth), a small “convenience” store (with some breadth and less depth), and a "butcher" or “newsagent” which we expect to be more more specialised. We expect a newsagent to offer a wider choice of newspapers and magazines than a convenience store, but we would still expect a convenience store to offer newspapers. We expect a convenience store to offer much more than newspapers, and we would be surprised if a newsagent offered nothing but newspapers. We expect a butcher to offer a wider choice of meat than a convenience store. Ours does excellent sandwiches, ready meals, pies, vegetables, and various other items as well. Although the principles are fairly clear, the precise boundaries between retail categories are always going to be difficult to pin down.

As a result, it doesn't matter how clear the definitions are for different terms covering different levels of specialisation. We should still expect some inconsistency in the way that different tags are used. Some retailers have a business model that is closer to the boundary than others, so it is inevitable that there will be a grey area where it is difficult to maintain a consistent boundary. The proper question isn't whether tagging ought to be consistent. It's whether there should be more consistency than we find.

To my mind there are several areas where the data does not look consistent enough. This is particularly true in the case of large stores which sell a broad range of goods (the big generalists).

For example, a data user who searches for “supermarket” and relies on the wiki for the definition, will expect to find “a large store for groceries and other goods” “a full service grocery store that often sells a variety of non-food products as well”. They will assume (perhaps because the wiki tells them) that “stores that do not provide full service grocery departments are generally not considered supermarkets”.

In practice they will find results that include a high proportion of outlets that fit this description, including most branches of the major chains that they will expect to find: ALDI, ASDA, Booths, Co-op, Iceland, Lidl, Morrison's, Sainbury's, Tesco, Waitrose, etc. However, they will also pick up a lot of convenience stores, and some stores tagged “supermarket” where few shoppers would expect to find groceries: Argos, Homebase, Matalan, Mothercare, Pets at Home, etc.

I estimate that around 10% of the data that they retrieve will not be what they expect.

Commercial search engines face a similar problem, because  smaller convenience stores often call themselves a supermarket, and this is inevitably picked up in their keyword searches. But OSM has a more structured data model. We should expect to perform better.

The situation with department stores is even more difficult for data users. The major chains are well covered, but they only represent about half of all retail outlets tagged as a department store. Data users who rely on the Wiki definition will be expecting “a large store with multiple clothing and other general merchandise departments”. They probably won't expect to pick up Poundstretcher, Argos, Matalan, Pets at Home, Staples, Superdrug, TK Maxx, etc. - but they will.

Wilkinson's (Wilko) is a difficult boundary case - with a particularly wide range of different key values for different branches.  My own view is that something like “homeware” would be the best description of their format, but only about 2% of contributors agree with me. And in practice, what should matter to data users is not what I think (even when I am right). What has to matter to data users is the consensus that develops across the majority of contributors. And in this particular case there is little consensus. It is difficult for anyone to know whether to consider Wilkinson's a department store or not. What is even more unsatisfactory for data users is that 25% of Wilkinson's stores are considered to be a department store, and even though that's the most popular option, 75% are tagged differently.

Neither of these examples is the result of a problem with the definition of the tags for a supermarket or a department store. The problem is that the same tags are being quite widely used for branches of chains where most contributors prefer an alternative. Good data on department stores and supermarkets is polluted by inconsistent data on other retail formats.

Looking further, the confusion lies partly in representing scale consistently, and partly in representing the degree of specialisation consistently.

Most specialists offer some categories of product that fall outside their main area of activity. Some position themselves as specialists in more than one area. As a result contributors can find it difficult to draw a consistent distinction between a specialist and a generalist outlet. If they are uncertain about the right specialist term to use, they tend to look for something more generic, and fall back on terms intended for generalists. This isn't entirely unreasonable behaviour. For a long time, the guidance, when in doubt, is to pick a popular tag that best fits the situation (rather than inventing a new one). Contributors don't necessarily have an understanding of all the tags  in use, and the result is that popular tags that were originally intended to apply to large outlets which offer a broad range are quite commonly used for smaller outlets offering a broad range, and for unusual specialists that are difficult for contributors to classify.

Looking at this another way, we have a choice of terms for shops which offer a broad range. Contributors who find it difficult to pick an appropriate tag veer towards picking one from a higher row in this table - they are the ones that are most widely used.

Primarily food Primarily non-food Hardware / building materials
Large generalists
“doityourself” (or sometimes “trade”)
Other generalists
“general” (rare) or “variety” (for pound shops)
“bakery”, “butcher”, “cheese”, etc.
“clothes”, “beauty”, “houseware”, etc.
“garden_centre”, “paint”, etc.

One result of tending towards tags for larger generalists is that supermarkets are over-represented in OSM. Industry figures show 6,410 stores in this category in the UK, whereas I found 7,045 (110%) in OSM. Convenience stores, on the other hand are under-recorded. I found 9,717 out of 48,303 identified by the industry (i.e. just 20%).

It is obvious from the data that contributors find it difficult to to make a distinction between a supermarket and convenience store. In England and Wales the law on opening hours varies for different sizes of store, with restricted hours on Sunday for those of more than 208 sq. metres (3,000 sq. ft.) So a supermarket of less than 280 square metres (3,000 sq. ft.) would be normally be considered a convenience store, and a convenience store of more than 280 square metres would be considered a supermarket. However, in OSM, at least 9% of outlets marked as a supermarket in OSM (and recorded as an area rather than a node) have a floorspace of less than 280 sq metres. Around one in three of the stores operated by one of the major convenience store chains is tagged as a supermarket. Convenience stores don't have to offer extended opening hours, we can't really expect contributors to measure the footprint, and the situation is further confused because some convenience stores describe themselves as a supermarket. The upshot is that almost a thousand convenience stores in OSM are marked as a supermarket. And meanwhile, because convenience stores are generally under-recorded, around 30% of the general grocery sector has yet to be added to OSM.

Changing tack, department stores sell a range of general merchandise, typically including clothing, household appliances, toys and games, personal-care products and garden equipment. Some also sell food, but non-specialised food stores are properly classified as supermarkets. With very few exceptions the major UK department store chains, such as John Lewis, Debenhams, and House of Fraser are tagged correctly as a department store. However, not all retail premises tagged “department_store” comfortably fit the description.

Examples include branches of Argos (normally tagged “catalogue”), TK Maxx and Matalan (normally tagged “clothes”), Poundland (normally tagged “variety_store”, sometimes “convenience” or “supermarket”), Mothercare (normally “baby_goods”, sometimes “clothes”), Wilkinson's (“department_store” for 25% of branches, plus a wide range of different alternatives).

The Wiki describes Do-It-Yourself-stores as being similar to hardware stores, except generally larger, stocking a wider range of products, and targeting customers who are non-professionals working on home improvements, redecorating, gardening, etc. Pure DIY stores are well covered in the database, and consistently tagged. In the case of Homebase, B&Q and Wickes, for example, more than two-thirds of branches are in the database,  and well over 90% are tagged as “doityourself”.

The same is not true of builders' merchants (which according to the documentation are properly tagged as “trade”). Fewer than 10% of Jewsons, and Travis Perkins branches are in the database, and they are tagged with a mix of “doityourself”, “hardware”, and “trade”, with “doityourself” as the most common.

There seem to be two issues here. One is that many trade outlets also serve non-professionals, so their business model overlaps with the scope of “doityourself” (this is accepted in the documentation on “shop=trade”, but contributors are either uncomfortable with it, or simply don't recognise these as trade outlets). The other issue is that there are different degrees of specialisation in the trade side of the market. Specialists in supplying the trade with building materials, timber, plumbing, bathroom furniture, electrical goods, tools, etc. all seem to be under-recorded, and inconsistently tagged. Again, where there is no clear consensus, contributors have fallen back on common tags such as “doityourself” and “hardware”, that were originally intended for generalists supplying the non-professional, and so are more widely used.

Branches of Wilkinson's and Robert Dyas don't fit comfortably into any of the most common categories, so they tend to suffer from highly inconsistent tagging (department_store, doityourself or hardware). We could blame contributors, but surely some of the tagging inconsistency shows that there may be a need for:

  • more specific options to cover particular retail format that do not comfortably fit the current categories
  • more generic options, so that contributors have an alternative to popular tags intended for large generalists 

No comments: