Saturday 12 February 2011

House or road

More name checks today, the target was the Barmby Moor area near Pocklington. As we have seen before, the OS Locator data is very useful. It showed up a few small new developments in Barmby Moor. It also showed up a number of oddities which we have sorted out.

When I originally surveyed Barmby Moor we did it quite quickly and some street names were missing. Today I wanted to correct that. One area that was a bit awkward was the main road through the village. It only bears a name plate at the north east end, where it says Front Street. According to OS Locator it is called Main Street. We couldn't see any other name boards, so if the name changes it was difficult to see where. I looked at the house numbers for clues, but for once that didn't work - the houses have names and no numbers. I decided to ask the locals.

A very helpful elderly chap was tidying the garden of his bungalow and he explained, at some length, how the name of the street had changed in his lifetime. He explained where Front Street changes to Main Street and how, when the bungalow he lives in was built 30 years ago, the end of the street became West End. Sure enough the few bungalows at the end of the street have a numbering scheme that fits.

The main A1079 that runs by the village is called York Road on OS Locator. There are no signs to that effect that I can see, but it does go to York. Oddly there is a small piece of the road that seem to have a name Gale Hill within the section that is also called York Road. Looking closely I spotted a house called Gales Hill in just the right spot. I have seen this elsewhere with OS data. Near Molescroft there is a road OS think is Constitution Hill, but that is the name of a nearby farm. The road is called Malton Road, I know someone who lives there.

If you are using OS Locator for road names, beware of surveyors who can't tell a house name from a road name.

Friday 11 February 2011


I like the tools that encourage people to get out and look at their surroundings. Some might call it surveying, but often it's just an extra to some other journey or visit. Sadly some people don't get out and look what's there, they just use the tools to blithely add data to the OSM database regardless of how up-to-date it is or how valid it is.

Today we wandered around part of the East Riding looking for names where OSM and OS Locator don't agree. The rain had stopped but the promised brightness didn't appear. I found a new housing development with a few houses on it developed since our last visit (one of the best uses of the OS Locator dataset). We found a couple of roads OS Locator has names for where we couldn't see a name and a few roads where the OS Locator data was wrong. Coppleflat Lane seems to be everywhere and actually it is nowhere. I think the OS surveyor (or whoever they got their data from) must have had a bad day.

I toyed with venturing down a named, very muddy track that we didn't have a trace for and decided that a trace from Bing, well aligned with our GPS track would do. The Bing photos are a bit old, but I estimate the track is hundreds of years old so they will do nicely.

One name I didn't know was Bluestone Bottoms. I can feel some sympathy with the road - this hard chair that I'm on now leaves me feeling a bit like that sometimes.

Friday 4 February 2011

A whole village?

We took a tour around Melton, a small village not far from home. My aim was to add all of the addresses for the place, from collection to editing, in an afternoon and I failed. We wandered around the place in a howling gale, struggling to see some of the addresses. The addresses of some houses which don't display a number can be determined just by looking at the number either side, but on some of the older parts of village streets that doesn't work. Some houses only have names, some are recent additions to the street and the numbers don't fit and sometimes there are gaps in the numbers for no apparent reason.

We completed the whole village quite quickly. Except we didn't - I had forgotten that part of the village lies to the south of the A63. So, addresses for a village in an afternoon? Not quite.

Wednesday 2 February 2011

Binary files

I have downloaded lots of OSM data from the API, XAPI, the Export tab and from other sources such as Geofabrik and Cloudmade. This has usually been XML files which are often compressed and are easy to understand and deal with, if a bit bulky. The ubiquitous XML format means there are XML parsers available for the languages I use and choices in the way of approaching parsing depending on what you are trying to do, though XML parsing remains a fairly slow process especially given that it is rarely an end in itself so the real use of the data cannot start until the XML has been deciphered.

The volume of data for a given area has steadily increased as detail has increased and the one thing XML is not good at is being compact. The process of moving OSM data around involves moving data across networks and this is sometimes a slow process - my own broadband is not very quick and I have no choice of supplier so there is no competitive pressure to improve the performance or the dreadful customer service, but that's another story. One way, potentially, to improve this is to use a binary file format which could be much smaller even than a compressed XML file and might be quicker to encode and decode. Such a format has been created by Scott Crosby based on the Google Protocol Buffers (protobuf). The files are known as .pbf files.

I have not used the protobuf file formats before so I read the Google pages which seemed to make some sense, though the examples are simplistic.  I turned to the OSM wiki page describing how Scott has created the protobuf layouts and it left me baffled, so I looked at the source code of various utilities that now incorporate support for .pbf files.

I wanted to write something using a .pbf file and, having done this kind of thing before, I know that it is easy to copy other people's code and use it without really understanding it; understanding the use of the .pbf files was an important objective for me. Google provide direct support for using protobuf in C, Java and Python, but there were no Python OSM examples to be found, so I decided to start by writing a Python pbf parser so I couldn't just copy someone else's code verbatim.

I downloaded a .pbf file from Geofabrik to work on. Examining the layout of the file was not easy. If you try to use a hex viewer things are not clear because chunks of the data can be compressed and even uncompressed parts are difficult because the protobuf format squashes data, especially numbers, into the minimum number of bytes to save space. Most of the text used in tags are in a string table, so each string only occurs once in each block of data. In the end I simply wrote code to work through the file extracting and examining each part step by step. The OSM wiki page did help in some places, but I got most help by looking at  other code, the protobuf definitions and the Python files the protobuf compiler creates.

I now have a Python script to parse .pbf files, so I can use the data in the same way as I would having parsed it from XML. I have used the Python OSM classes that I created some time ago to store the data so I can write XML as a quick test. If you are interested in seeing the result of my work you can download it from here.

I have tried to parse the downloaded file for England from Geofabrik. It couldn't load all of the nodes in my 3Gb memory and was killed. I removed the code to store the nodes, ways and relations just to let the code run through the whole file. After more than two hours it had run through, but I couldn't have done very much with it as none of the expanded data was saved. It works well for a smaller area, such as a county or a city, but it doesn't handle big files very well at all.