Friday, 20 January 2012

Tagging of shops

One of the challenges in adding various different types of retail outlet to the map is that retailers have always had a habit of finding gaps in the market where they can expand their range, in order to reach more customers. So whatever boxes we try to put shops into, somebody is trying to break out of them. We happily buy sweets from a newsagent, and kitchenware from a hardware shop.

It is difficult to measure how contributors tag individual shops, because variations in the data reflect variations in the real world. But because big chains tend to operate a lot of similar shops, we can look at the different ways these are tagged to get some idea of how contributors handle difficult categories.

To keep this in proportion, there are several large chains where there isn't much of an issue. In these cases there is a high level of consensus, and almost all branches in a chain carry similar tags. This suggests that most contributors see the same type of shop. Examples where more than 80% of branches carry the same value for the "shop" tag include Tesco, Sainsbury, Lidl, Morrisons, Asda, Aldi, Waitrose, Iceland, Somerfield and Tesco Metro = supermarkets; Londis, Premier, and One Stop = convenience stores; B and Q and Homebase = doityourself; Greggs = bakery; and Next = clothes.

Other chains seem to be operating across a genuine boundary between two similar categories. As a result a limited number of different tags have been used across the whole of the chain. Among the big chains this mainly applies to certain brands of supermarkets / convenience stores, typically with 50-75% of branches in one category, and the rest in another: examples include Co-op, Spar, Tesco Express, Costcutter, Cooperative Food, Sainsburys Local and Budgens.

Other chains have business models that are proving more difficult for contributors to categorise. This often seems to depend on how much emphasis the retailer places on offering a wide range of goods, or specialising in one type (or at least on how different contributors perceive this balance). In these cases different contributors normally choose between just a couple of options. For example, both shop=department_store and shop=clothes are commonly applied across chains such as Matalan, Debenhams, TK Maxx.

There are a few cases were there is a wide variety of different categories within the same chain, and little consensus between contributors. These seem to be chains that do not just offer a wide range of goods, but also compete with others that are much more specialised. There are a few where there is a huge mixture of tags. Contributors have used ten or more different values for "shop" to describe each of the following examples: Halfords (with 34 different values for "shop"; "bicycle" as the most common); Argos (25 different values for shop); Marks and Spencer; W H Smith; and Wilkinson.

So the challenge facing contributors is often that a shop offers a range of goods (or a shop format) that cannot easily be pinned down. But there are also some areas where there seems to be consensus among contributors on the type of shop they are dealing with, but not necessarily on  the best tag to use. The most common examples use different terminology which means much the same thing - such as shop=betting or shop=bookmaker (for William Hill, Ladbrokes, etc). Either "shop=alcohol" or "shop=beverage" is common used for chains such as Bargain Booze, Oddbins, and Thresher.

Supermarkets and convenience stores are an example where one term is better recognised for smaller shops, and a different term among larger chains, with both covering similar scope. There are similar examples, including hardware shops and do-it-yourself stores, with a certain amount of overlap between the two.

Finally, there is the issue of spelling mistakes: abbreviations, plural / possessive forms, and alternative approaches to capitalisation and spacing ("newsagent", "news agent", "News Agent", "news_agent", or even "newspaper", or just "news"). It is not difficult to find examples like this, but in practice they don't seem to account for a significant proportion of the total. It is mainly a problem where the normal term for the shop is long, or uses more than a single word, but even there it's not a particularly serious problem. Out of the list of different terms for shops selling newspapers, "newsagent" accounts for 98% of occurrences in the database. In general, around 99% of shops with spelling variations are tagged with the most common value, while the rarer variants account for only 1%. As far as I can see the widest variations in the UK are among DIY shops (variously tagged : doityourself, diy, DIY, etc.) and estate agents (tagged "estate agency", "Estate Agent", "estate agents", "estate_agency", etc.). Even in these, the most common option has been applied across more than 90% of the sample.

The bottom line is that while it is easy to envisage a tool to fix spelling variants, this isn't the real challenge. There are a few cases where more consensus on terminology would make things a bit tidier. But the real challenge is to find better ways of handling nuances between the different business formats that we find in the real world - such as where we use different terminology for different sizes of shop, or for different levels of specialisation. It's hard to imagine a tagging scheme that is going to directly solve the problem of where to buy a spanner, a cycling magazine, or a bottle of balsamic vinegar. Some problems are probably best left to a fuzzy search process. In the meantime the best approach for contributors seems to be to stick to the common values when adding a shop; and pick the value that best describes what we see on the ground.

Nothing new there.

2 comments:

Chris Hill said...

Fascinating analysis. It confirms my dislike for the lint chasers who change the tags on an object without every visiting it. I have had to send messages recently to a couple of these deluded souls who are changing the tags to fit in with some wiki page or other, which could, of course, have been changed the day before by a twiddler.

I must get around to adding more shop tags ...

gom1 said...

Me too. I guess thee is a place for fixing spelling mistakes but as the data gets more sophisticated there is a real risk that we lose a lot of nuances in the way things are tagged by people who have actually been to see. You do need to add more shops though - it's the only feature (so far) where I haven't found your area ranking among the best coverage in the UK.