It isn't difficult to derive some broad rules of thumb about the balance between different types of retail premises that might suggest where coverage looks incomplete. Across all of the data that I have extracted, 48% of retail premises are shops, and the rest offer either refreshments or services. There are a number of places where the mix is quite different. Of course it may be that some of these towns have an extraordinarily large number of cafés and pubs. More likely that contributors haven't got round to adding many shops yet.
Similarly, there are towns where there don't seem to be as many cafés and pubs as one would normally expect. Again, this could reflect reality on the ground, but it might also point to areas that deserve some more attention.
Following the same line of thought, it ought to be possible to measure the mix on individual shopping streets. For this experiment I used the centre of Nottingham. I have no local knowledge of Nottingham, but the coverage of retail premises there is comprehensive - so the data is relatively easy to work with. Here the mix of retail premises is highlighted on any street where there is a decent sample to work with. The proportion of Shops is shown in Cyan; Refreshments (cafés, pubs, etc) in Yellow, and Services (banks, estate agents, etc) in Magenta. Green implies areas where shops and refreshments predominate. Orange implies that refreshments and services predominate (i.e. comparatively few shops). The idea was to test whether it is possible to give contributors an overall impression of the contents of the map which they can compare against local knowledge of how the town centre is organised – at a broader level than the detailed location of individual shops. It has flaws, and the data is difficult to manipulate - so I'm not convinced the approach is practical - but it might point a way towards better alternatives.
Contributors with an interest in mapping particular types of retail may be able to take advantage of the fact that similar types of retail tend to cluster together. On OSM, 85% of clothes shops have another clothes shop with 100 metres (25% have at least 10 more clothes shops within 100 metres); 70% of banks have another bank within 100 metres; 60% of estate agents and 60% of fast food outlets have an estate agent / fast food outlet within 100 metres; 40% of pubs have another pub within 100 metres. Identifying this kind of cluster might be helpful for some kinds of location search, and it may also provide useful feedback to contributors, who are able to compare the state of the map against local knowledge to identify clusters that look incomplete.
Here, for example is a map of Manchester showing clusters of clothing shops that can be identified from existing data. The analysis began with a broad definition of a clothing shop (shop=clothes, shoes, fashion, boutique, or department store) then used R clustering capabilities (the DBSCAN algorithm) on a data extract to find areas where there are more than five clothing shops within 100 metres of each other. This particular example is probably of limited use to those of us who are unfamiliar with Manchester (and also, for that matter, for those of us who are unfamiliar with shopping for clothes). But on the face of it, there must be quite a lot of missing clothes shops in Manchester, and the presence and absence of clusters in the data might point local fashion-conscious mappers to areas that deserve attention.
SK53 has just pointed out that it should be possible to extend this kind of approach using Food Hygiene data to identify retail areas, and compare them with OSM data. I haven't tried yet, but it sounds like a promising idea.
Here is an additional example, picking up on the idea that Food Hygiene data might be used to identify suburban areas that need more attention. The Food Hygiene data shows location and food hygiene status for a variety of retail outlets, including pubs, supermarkets, takeaways, restaurants, cafes and some other types of retailer. Of course, the same data could also be used to identify individual outlets that are missing, but since the data only covers certain types of outlet, the aim here is more general. The idea is to identify suburban areas where there may be several missing retail outlets, including some that don't offer food.
These are Liverpool suburbs where there is Food Hygiene data on at least five retail outlets, but none appear in OSM. Relatively few areas in the UK fit these rather crude criteria. More sophisticated approaches must be possible, but refining them will need more experimentation, and that will take longer. Meanwhile this suggests that the general approach should work in principle.
And another example, covering Sunderland. This uses the more granular ONS Lower Layer Super Output Areas. Those rendered are where OSM contains no retail outlet, but the Food Standards Agency has at least one Food Hygiene Record (for a high-street business type). The darker the polygon, the more FSA records it contains, and hence the more retail outlets are likely to be missing from OSM.
And a third example, for Sheffield, showing the difference between the number of Food Hygiene Records (for high-street business types), and the number of OSM retail features that fall within each LSOA. Once again, the figures aren't directly comparable. The aim is to highlight areas where the OSM data is implausibly thin, so the figures are no more than a proxy measure of how great the shortfall is likely to be. Areas are not coloured where the volume of OSM data is equal to or larger than the FSA Food Hygiene figure - but this doesn't necessarily imply that they are complete. The real message is "if you go to the dark red areas you should find lots of unmapped shops to add".