Sunday 25 April 2010

Getting boundaries right

I've got to grips with the boundary data that the Ordnance Survey released earlier this month. The data was in shape files and, with the help of OSM::SK53, I have extracted OSM style data from them. Shape files are a twenty-year-old format that can include a projection file (*.prj). This was where the problem lay that stopped me using the data before. With Jerry's help the projection file was altered and then the process of changing the projection OS use to the one OSM uses was easy.

If you are thinking of using OS shape files be warned: don't trust their *.prj files - they are incomplete.

Once the shape files were in the OSM projection I could then extract polygons or polylines from them. I created a python script that will extract a named parish boundary polygons or numbered polylines that make up the coastline. I spent a few mind-numbing hours loading each of the parishes that fall on the outside edge of the county of the East Riding of Yorkshire. I created a relation for each parish, and deleted each of the sections of ways that were duplicated between parishes. The outer edges of each parish also marked the edge of the county too, so the relations for the county and the region has improved too.

I have also updated the coastline ways from the Boundary Line data set from OS. I managed to leave them broken over night, as OSM::PA94 realised and helped to fix. The coastline is not often updated - it's not part of the normal rendering process for Mapnik, so I'll have to wait to see what Mapnik makes of it.

I have noticed a few things as part of this. The boundaries sometimes follow a stream or river, but sometimes the boundary leaves the river briefly, probably because the river has moved, but the boundary hasn't. The boundaries do not follow the centre line of roads. They clearly lie at one side or the other and at certain points you can see where the boundary jinks across to the other side.

Lastly I have to say the match between the boundaries and the surveyed areas is very close and I'm very comfortable in using the OS data for boundaries, especially because there is no better way to survey this data on the ground.

I will write up the detail of the steps involved if there is any interest.

Sunday 18 April 2010

OS Data part 3

The OS data is correct. I have checked a few points in polygons in the shp file in the OS projection and they are the same as a paper map, so somehow the transformation from OS to OSM is translating the result.

As an example a point on the transformed shp file is 53.76098, -0.48939 should be 53.76129, -0.49106.  As a bodge I'll add the translation to the OSM output and then test a few more places to see if the translation is consistent across the country.

OS Data part 2

I've checked the transformation from the OS projection to the OSM projection and it seems to be correct. The resulting boundary is the right shape and size, but it is shifted to the east. I'm going to check out exactly how much - it would be possible to correct it during the creation of the OSM file.

Why would this data be translated? The transformation could be wrong, the OS data could be wrong, the existing OSM data that I'm comparing it to could be wrong or something else. I have tried the transformation with two separate programs and they both produce the same results. The existing OS data has been gathered by various means and is likely to be mostly right. There is a chance that the OS has an error, I wonder if anyone else has looked at it yet.

If the OS data is OK, then what else have I missed?

Ordnance survey boundaries

The UK Ordnance Survey has generously released some of its data for unlimited use. One of the data sets is about boundaries and included in it are parish boundaries. Boundaries are not painted on the ground so adding them to OSM is hard. Some people have added some parish boundaries, or bits of them, from the out-of-copyright NPE maps from OS, but a lot has changed in over sixty years.

The OS boundary data comes in the form of ESRI: shape files, about 179MB to make up the parish boundaries. I loaded it in QGIS to take a look at it - there where about 14,000 polygons. Dealing with this is not going to be a small task.

I dug about T'Internet to work out how shape files work and after a bit of work I've extracted a list of the polygons in the shape file, then a method to create an OSM file for each polygon suitable for loading into JOSM.

I loaded the polygon for my village and at first sight it looks good, but when I compared it to a real OS map, the parish boundary doesn't quite match - I've not transformed the shape file from its original OS projection to the OSM projection correctly.

This is  not the only problem of course. If I load a single parish boundary that's fairly easy. If I want to load an adjacent boundary some of the nodes need to be shared nodes along parts of the boundary. In addition the parish boundaries will share nodes with the county boundaries too. There are district boundaries available too, so these will also share nodes. On top of this there is the problem of dealing with the existing data that people have already loaded which might need to be merged, deleted or tagged as an historic boundary. This alone makes it important that each boundary is processed by someone who can make decisions about what to merge etc.

I'm going to sort out the projection issue, then import a single boundary via JOSM and see what that looks like. If that goes well I'll import an adjacent boundary and see what it takes to merge the shared nodes.

If I can make this work well, I'll write up a process and consider how to offer parish boundary files to other people for them to import in their local area, but supplying 14,000 files might be too big a task. Then there are the other boundaries types too ...