False synonyms
True synonyms add to the confusion, provoke debate, and may discourage some data users, but in practice I suspect “false synonyms” are a bigger problem. By this, I mean tags that are used interchangeably by contributors, even when they are not true synonyms according to the guidelines. Again, we can use major chains to do some cross-checking of whether tags with similar meanings are applied consistently.- Almost every major chain of pharmacies has a mix of outlets tagged as “shop=pharmacy” and “shop=chemist”.
- Similarly “alcohol”, “wine”, “beverages” seem to be used interchangeably for chains of off-licences and wine merchants, with “alcohol” as the most common of these. The less common “off-licence” is not widely used on retail outlets
- For chains such as Ladbrookes and William Hill, “bookmaker'” is the most common, but “betting” and “gambling” are also quite common
- There is a lot of overlap between outlets that are described by the relatively common “doityourself”, and the less common “hardware”, “building_supplies”, “trade”
- For “mobile_phone” the less common alternatives are “phone” and “electronics”. Tagging "electronics" could be a symptom of an evolving retail format. Phone looks like a false synonym.
The documentation in the wiki makes it reasonably clear that the above are not true synonyms, but contributors have treated them as synonyms in the sense that similar branches of the same chain use a mix of different values. As a result, data users are unable to tell where there is a true difference, and where there is imprecise tagging. In effect data users are being pushed to treat these as synonyms, even though they are documented as having different meanings.
These are examples of retail formats that contributors have difficulty with. Data users, those who maintain the documentation, and those who advocate changes to tagging need to be sensitive to where these occur. We'll look in more detail at some common examples shortly.
Multi-specialists
The above are all examples of specialist retailers. Multiple specialities are another area that give contributors a problem. Halfords is one of the most easily identified examples. How best to tag a store that offers both bicycles and car parts? The solutions that contributors have come up with include around 30 different variants:- Choosing just one of the options: “bicycle”, “automotive”, “car_accessories”, “auto_accessories”, “car_parts” and ignoring any other area of specialisation
- Contributing a list of options separated by semi-colons: “bicycle;car_parts”, “car;bicycle” “bicycle; car_accessories”, “motor;bicycle”
- Using a more generic category: “doityourself”, “hardware”
The usual way to assign multiple values to a key is a list separated by semi-colons. In practice this is not widely used for shops (less than one in a thousand examples), but there are examples which give an idea of other multiple specialities that are giving contributors problems:
- “hairdresser;beauty”
- “kitchen;bathroom”
- “greengrocer;florist”
- “dry_cleaning;laundry”
- “art;frame”
- “car;bicycle”
- “shoe_repair;key_cutting”
- “bicycle;car_parts”
- “tattoo;piercing”
Noticeably, these are all pairs. Happily, there don't seem to be any long lists of shop types. Contributors recognise that the intention is to record mixed types of speciality shop, not to list all the categories of good for sale.
The limited number of examples mean that these won't give data users a great problem. If they chose to ignore them they won't lose much data. If they prefer to break out the list then it won't give the much difficulty. More importantly, to my mind, contributors are sending signals here about retail formats that they find it difficult to categorise. This could be valuable information for those who maintain the documentation, and those who advocate changes to tagging.
No comments:
Post a Comment