Wednesday 28 April 2010

data.gov.uk

Over the past couple of weeks at work I have been attempting to search for and download datasets from data.gov.uk for use in a project, it has been rather frustrating.  

For those who haven't come across data.gov.uk it is perhaps the greatest opportunity for the HM Government to open the wealth of non personal data to the public, research groups and private companies.  More than that there actually seems to be a group of people behind the project that are really passionate about the provision of data.  Currently the site is still in a beta phase with new datasets being added constantly.


The first thing to establish quite what data.gov.uk is.  Before I started this project I thought that it was a data repository from which I could download data tables, and while to some extent it is, arguably it is actually more of a data search tool which provides links to data held by different government departments and other organisations.   SPARQL adds a bit of a twist to this but more of SPARQL later.


The issue here is that each department or organisation is only interested in data that it provides, the Justice department provides information on Crime and Justice, health data comes from the NHS, Primary Health Trusts and so forth.  This is understandable I suppose, however what this means is that there is no overall attempt to make any of the data comparable with other datasets.  Data from the Justice department is often recorded against a different geography either in terms of boundaries or geographic resolution from that of the NHS.  Census data is recorded against another level of geography while election results are stored against yet another.

A further issue posed by the lack of coordination whilst seemingly pretty straight forward makes the analysis of data even more difficult.  As yet there is no adhered to naming convention for the different levels of geography.  For example a local Primary Care Trust might record data against the “London Borough of Brent”, a police authority may record data against simply “Brent” while census information may record information against “Brent LB” or even a borough code “00AE”  While it is simple for a human to see that at least three of the names here are the same a computer cannot do the same thing, consequently a manual process of ensuring that all names are the same is required.  This is a very time consuming process and has to be repeated for almost every dataset that you wish to download and map.

The temporal aspects of data also have to be considered, different datasets are recorded over different time periods.  Data could be recorded for the calendar year, financial year, quarterly, rolling period and so forth.  In many cases because data is coming from different departments there is no commonality between them.

This sounds very negative and to some extent that is the case, it has been a very frustrating process of identifying data, downloading it, formatting it only to find that the dataset against which I want to perform a comparison is in a different format, covers a different date range and is at a different geographic scale rendering comparisons impossible.


This is perhaps the time to talk about SPARQL.  If like me you don't know your SPARQL from your elbow then data.gov.uk isn't really the place to look for answers.  To understand SPARQL you need to understand the Semantic Web and RDF.  So lets start with the Semantic Web which was thought up by Tim Berners-Lee, founder of the WWW, URLs, HTTP and HTML. Web sites across the world store huge amounts of data, whether that data contains football results, weather reports, demographic information, or crime statistics, however in HTML this data is difficult to use in the way that you might like to as it is generally unstructured.  


What the Semantic Web attempts to do is to provide a structural format (built on syntaxes which use URIs to represent data) which can be queried or processed by machines.  Incidentally these syntaxes are called Resource Description Framework (RDF) syntaxes.


So what does the Semantic Web and RDF have to do with data.gov.uk, well the clear plan of data.gov.uk is to store government data in RDF syntaxes which can then be queried using SPARQL which is a query language and data access protocol for the Semantic Web.  So if you know the SPARQL you can get the data, or at least that's the idea.  But SPARQL isn't a simple language to pick up and it isn't something that your average Joe is going to get into and at the moment only a limited number of datasets are suitable for SPARQL queries.


SPARQL does provide a an opportunity to query data in a way that wasn't previously possible but it's pretty heavy weight and difficult to get into and for many the barrier will be set to high.


Perhaps I'm asking for too much, just releasing the data is a major step forward and for that we should be thankful, and yes it is still in beta and I'm sure that over time it will improve.  Actually being able to search for data and find data is something that I shouldn't understate either, it's as big as the Ordnance Survey finally making some of their data public (incidentally data.gov.uk does find OS data as well), but for the geographer it's all a little frustrating.

Monday 12 April 2010

All Blacks Tag Cloud

Ok so not cartograms but still pretty interesting.  Created in Tagxedo (www.tagxedo.com) these two tag clouds are based on the Wikipedia entry for the All Blacks New Zealand Rugby team, with the cloud constrained by by a map outline of New Zealand.  With loads of options to alter the colour, font and orientation of the tags you can create some interesting images. 


Tagxedo All Blacks Cloud
Tagxedo All Blacks Cloud

Saturday 10 April 2010

New Zealand 2006 Population Statistics

Every one knows that New Zealand is a pretty special place.  A population of 4.23 million people spread across a country the size of the the UK.  Auckland whilst not the capital, is the largest city with a population of 1.26 which makes for a really nice cartogram.


2006 Population Count
You can clearly see the distortion of the North Island caused by Auckland, it's pretty impressive, but equally the distortion of the South Island demonstrates how sparsely it is populated.  Incidentally the bulge on the east cost of the South Island is Christchurch.

Friday 9 April 2010

2005 UK General Election Cartograms

I created these two cartograms show the percentage of votes that the Conservatives and Labour received at the 2005 general election. Colours range from red, low voter share, to green high voters share.

Conservative Vote Share
Labour Vote Share


As you might expect given that these are the two largest political parties in the UK, the maps are mirrors of each other, where Labour have a small voter share the Conservatives have a large one and visa versa.  The maps clearly show areas of support for each political party.  Conservatives have support in the home counties and many rural areas and Labour in the industrial north of England and in Scotland.

Mark Newman co author (Danny Dorling & Anna Barford) of a book called  The Atlas of the Real World produced these cartographs of the 2008 US election http://www-personal.umich.edu/~mejn/election/2008/ which provide an interesting comparison to the maps above.

Thursday 8 April 2010

Welcome

So what's a cartogram, well Wikipedia describes it as:


"A cartogram is a map in which some thematic mapping variable – such as travel time or Gross National Product – is substituted for land area. The geometry or space of the map is distorted in order to convey the information of this alternate variable."


Which is a pretty good description as it goes, but what I really like is their artistic nature.  Maps have always held some fascination for me, I have worked in Geographic Information for 15 years, and have worked with all kinds of maps showing a wide range of information , but it was a recent stumble upon WorldMapper (http://www.worldmapper.org/) that sparked my interest in Cartograms and more importantly their artistic qualities.


So what's this blog about, well cartograms quite obviously, but also my attempts to create them at smaller levels of geography and about subjects that interest me and might interest you.  Given that a general election in the UK has been announced only a few days ago I thought that I might look at voting patters in the UK for past elections and attempt to see what light cartograms can bring to the party.


I'll post again in a few days.