Wednesday 28 December 2011

Number crunching

I've been contributing in a small way to OSM for a while now, and I am always impressed by the amount of information that keeps being added to the map. Although there is still work to do on the UK transport network, I think most would agree that it is now well above the level of coverage needed to provide a valuable platform for all kinds of different applications.

What I didn't have a good feel for is how well other aspects of the map are covered. I like to measure stuff, so in the down-time between Christmas and New Year, I've been playing around estimating how many examples of different features we should expect to find in the database, and comparing that estimate to how many are actually recorded.

This is inevitably a bit rough and ready. I needed to estimate how many examples of a feature I should expect to find. Then I needed to figure out a simple way of measuring how many are already in the database; with a reasonable level of confidence that both figures are counting the same things. This doesn't always work. For various reasons it turns out that it's not straightforward to measure things like the number of public telephones, airports, windmills, car parks, cemeteries, and sports grounds. I've included some figures on facilities like restaurants, and hotels, but I suspect that the definitions used in different statistics might not be a very good match to the definitions used by contributors. With more care I might be able to improve these in future.

Still, it's a starting point, and with all sorts of caveats, I think I can identify some features that users of the data can generally expect to find already exist in the database:

Feature
 Found 
 Estimate       
Completeness
Bus stations
               498
                 427
117%
Theatres
               809
                 742
109%
Marinas
               458
                 436
105%
Petrol stations
           6,451
             6,301
102%
Police stations
           1,985
             2,036
97%
Toilets
           4,441
             4,714
94%
Fire stations
           1,500
             1,642
91%

I think I can also identify some types of features where coverage is less complete, but where any particular example is still more likely to be found in the database than not. Some of these are already the target of concentrated activity to improve coverage. Perhaps others are the areas where some more careful analysis would be most useful.

Feature
 Found 
 Estimate 
Completeness
Museums
   1,505
   1,766
85%
Supermarkets
6,681
7,970
83%
Libraries
      2,462
        3,206
77%
Hotels
      5,762
        7,561
76%
Cinemas
         578
           763
76%
Schools
    22,969
      33,121
69%
Pubs
    32,472
      49,303
66%
Cafes
      7,481
      12,140
62%
Breweries
         110
           180
61%
A and E (England)
91
150
61%
Allotments
    3,936
      7,286
54%
Post Offices
    7,372
      14,000
53%
Cycle shops
1,253
2,500
50%

The list of features that I explored is deliberately pretty arbitrary. I covered some that I happen to be interested in; some where I had an estimate to hand, or could easily uncover one; some drawn from OSM project of the month / week activity, and others because I thought they might be of interest to contributors, or of value to map users. The following list shows some where it looks as though contributors will find it fairly easy to find additional examples to add to the map.

Feature
 Found 
 Estimate 
Completeness
Pret a manger stores
115
235
49%
Starbucks stores
330
717
46%
Restaurants
         10,043
           25,226
40%
Ice rinks
                 13
                   33
39%
Night clubs
               548
             1,507
36%
Hostels
               454
             1,260
36%
Golf courses
               709
             2,002
35%
Letter boxes
39,329
116,092
34%
Swimming pools
               217
                 641
34%
Anglican churches
           4,880
           15,976
31%
Veterinary clinics
               363
             1,271
29%
Garden centres
               733
             2,621
28%
Community pharmacies
           3,732
           13,425
28%
Lifeboat stations
                 31
                 135
23%
Casinos
                 37
                 180
21%
GP surgeries
           2,036
           10,352
20%
Bowling alleys
               103
                 528
20%
Convenience stores
7,972
48,289
17%
Sewage works
               808
             5,412
15%
Branches of Greggs
193
1,526
13%
Fish and chip shops
           1,334
           11,000
12%
National Trust properties
278
2,477
11%
English Heritage properties
64
584
11%
Shops of all types
         47,797
         458,275
10%
Piers
                   3
                   33
9%
Hairdressers
2,729
35,704
8%
Bookmakers
402
             5,029
8%
Dentists
               722
           10,927
7%
Charity shops
559
9,000
6%
Newsagents
801
16,500
5%

Unsurprisingly, my impression is that coverage is generally better for the bigger features that are more obvious (e.g. schools), and for those which particularly interest the OSM community (More pubs than post offices. More bicycle shops than golf courses. Surprisingly few fish and chip shops, though there are some tagging variants that would slightly boost coverage of these if I collected them more systematically). Some of the services that potential users might expect to find are not so well covered (e.g. GPs and dentists).

Even if they are rough  and ready, I think it has been useful to put numbers on some of this stuff. Partly because it helps to measure progress, partly to help think laterally about priorities, and partly as a sanity check on the amount of detail that is both practical and important. And not least because it already demonstrates an impressive level of coverage. However, there is no shortage of opportunity to record the existence of more interesting and useful features in 2012. You can bet on seeing more casinos and bookmakers; while other contributors get their teeth into the missing dentists.

I'll probably come back to this again. It's a bit of an iterative process, and if anyone wants to suggest a better estimate, a source of suitable data, or other features that I should try to measure, then please comment below.

1 comment:

gom1 said...

Note on Anglican churches: there are about 11,000 other places of worship in the database which are either tagged as Christian, without a denomination; or are not tagged with a religion at all. Judging by the names, most of these are likely to be Anglican. In other words, most Anglican churches are probably already in the database, but not tagged in a way that I can identify them.