PageRenderTime 107ms CodeModel.GetById 52ms RepoModel.GetById 0ms app.codeStats 0ms

/_posts/2010-11-21-exploring_art_data_3.md

https://gitlab.com/rheaplex/robmyers.org
Markdown | 172 lines | 149 code | 23 blank | 0 comment | 0 complexity | 1eb9d1d3bd9d271780e93f4c98e2a9d9 MD5 | raw file
  1. ---
  2. author: robmyers
  3. comments: true
  4. date: 2010-11-21 22:42:07+00:00
  5. layout: post
  6. slug: exploring_art_data_3
  7. title: Exploring Art Data 3
  8. wordpress_id: 1875
  9. categories:
  10. - Art Computing
  11. - Art Open Data
  12. ---
  13. Let's look at how much the "Grants For The Arts" programme of Arts Council England (ACE) gives to each region.
  14. First of all we'll need the data. That's available from data.gov.uk under the new CC-BY compatible Crown Copyright [here](http://data.gov.uk/dataset/grants-for-the-arts-awards-arts-council-england). It's in XLS format, which R doesn't load on GNU/Linux, but we can convert that to comma-separated values using OpenOffice.org Calc.
  15. Next we'll need a map to plot the data on. Ideally we'd use a Shapefile of the English regions, which R would be able to load and render easily, but there isn't a freely available one. There's a public domain SVG map of the English regions [here](https://secure.wikimedia.org/wikipedia/commons/wiki/File:England_Regions_-_Blank.svg), but R doesn't load SVG either. We can convert the SVG to a table of co-ordinates that we can plot from R using a Python script:
  16. <tt>#!/usr/bin/python
  17. from BeautifulSoup import BeautifulStoneSoup
  18. import re
  19. # We know that the file consists of a single top-level g
  20. # containing a flat list of path elements.
  21. # Each path consists of subpaths only using M/L/z
  22. # So use this knowledge to extract the polylines
  23. # Convert svg class names to gfta region names
  24. names = {"east-midlands":"East Midlands", "east-england":"East of England",
  25. "london":"London", "north-east":"North East",
  26. "north-west":"North West", "south-east":"South East",
  27. "south-west":"South West", "west-midlands":"West Midlands",
  28. "yorkshire-and-humber":"Yorkshire and The Humber"}
  29. svg = open("map/England_Regions_-_Blank.svg")
  30. soup = BeautifulStoneSoup(svg)
  31. # Get the canvas size, to use for flipping the y co-ordinate
  32. height = float(soup.svg["height"])
  33. # Get the containing g
  34. g = soup.find("g")
  35. # Get the translate in the transform
  36. transform = re.match(r"translate\((.+), (.+)\)", g["transform"])
  37. transform_x = float(transform.group(1))
  38. transform_y = float(transform.group(2))
  39. # Get the paths in the g
  40. paths = g.findAll("path")
  41. print("region,subpath,x,y")
  42. for path in paths:
  43. # Get the id and convert to region name
  44. region_name = names[path["id"]]
  45. # Get the path data to process
  46. path_d = path["d"]
  47. # Split around M commands to get subpaths
  48. path_d_subpaths = path_d.split("M")
  49. # Keep a count of the subpaths within the id so we can identify them
  50. subpath_count = 0
  51. for subpath in path_d_subpaths:
  52. # The split will result in a leading empty string
  53. if subpath == "":
  54. continue
  55. subpath_count = subpath_count + 1
  56. # Split around the L commands to get a list of points
  57. # The first M point already has its command letter removed
  58. points = subpath.split("L")
  59. for point in points:
  60. # Remove trailing z if present
  61. cleaned_point = point.split()[0]
  62. # Split out the point components and translate them
  63. (x, y) = cleaned_point.split(",")
  64. transformed_x = float(x) + transform_x
  65. flipped_y = height + (height - float(y))
  66. transformed_y = flipped_y + transform_y
  67. # Write a line in the csv
  68. print "%s,%s,%s,%s" % (region_name, subpath_count, transformed_x,
  69. transformed_y)
  70. </tt>
  71. Now we can load the grants data and the map into R, calculate the total value of grants for each region, and colour each region of the map accordingly.
  72. Here's the R code:
  73. <tt>## The data used to plot a map of the English regions
  74. england<-read.csv("map/England_Regions_-_Blank.csv",
  75. colClasses=c("factor", "integer", "numeric", "numeric"))
  76. ## Plot the English regions in the given colours
  77. ## See levels(england$region) for the region names
  78. ## colours is a list of region="#FF00FF" colours for regions
  79. ## range.min and range.max are for the key values
  80. ## main.title is the main label for the plot
  81. ## key.title is the title for the key
  82. plotEnglandRegions<-function(colours, range.min, range.max, main.title,
  83. key.title){
  84. plot.new()
  85. ## Reasonable values for the window size
  86. plot.window(c(0, 600),
  87. c(0, 600))
  88. ## For each regionname
  89. lapply(levels(england$region),
  90. function(region){
  91. if (region %in% levels(england$region)){
  92. ## For each subpath of each region
  93. lapply(1:max(england$subpath[england$region == region]),
  94. function(subpath){
  95. ## Get the points of that subpath
  96. subpath.points<-england[england$region == region &
  97. england$subpath == subpath,]
  98. ## And colour it the region's colour
  99. polygon(subpath.points$x, subpath.points$y,
  100. col=colours[[region]])
  101. })
  102. }
  103. })
  104. ## Colour Scale
  105. ## Turn off scientific notation (for less than 10 digits)
  106. options(scipen=10)
  107. ## Sort the colours so they match the values
  108. colours.sorted<-sort(colours)
  109. ## The by is set to fit the number of colours and the value range
  110. legend("topright", legend=seq(from=range.min, to=range.max,
  111. by=((range.max - range.min) / (length(colours) - 1))),
  112. fill=colours.sorted,
  113. title=key.title)
  114. title(main.title)
  115. }
  116. ## Load the region award data
  117. region<-read.csv("gfta/gfta_awards09_10_region.csv",
  118. colClasses=c("integer", "character", "character", "character",
  119. "character", "factor", "factor", "factor",
  120. "factor", "factor"))
  121. ## region$Award.amount contains commas
  122. region$Award.amount<-gsub(",", "", region$Award.amount)
  123. ## And we want it as a number
  124. region$Award.amount<-as.integer(region$Award.amount)
  125. ## Get the totals by region
  126. region.totals<-tapply(region$Award.amount, list(region$Region), sum)
  127. ## But we don't want the "Other" region
  128. region.totals<-region.totals[names(region.totals) != "Other"]
  129. ## Calculate the range of colours
  130. ## The minimum value, to the nearest lowest million
  131. value.max<-12000000
  132. ## The highest vvalue, to the nearest highest million
  133. value.min<-4000000
  134. ## The darkest colour (in a range of 0.0 to 1.0)
  135. colour.base<-0.15
  136. ## How to get the range of colours between that and 1.0
  137. colour.multiplier<-(1.0 - colour.base) / (value.max - value.min)
  138. ## Make the colour levels
  139. levels<-lapply(region.totals,
  140. function(i){
  141. colour.base + (i - value.min) * colour.multiplier})
  142. colours<-rgb(levels, 0, 0)
  143. ## Add the region names to the colours
  144. names(colours)<-names(region.totals)
  145. ## Plot each region in the given colour
  146. plotEnglandRegions(colours, value.min, value.max, "Grants For The Arts 2009/10",
  147. "Total awards in £")</tt>
  148. And here's the resulting map:
  149. [![gtfa.png](/assets/assets_c/2010/11/gtfa-thumb-400x377-32.png)](/weblog/assets_c/2010/11/gtfa-32.html)Who can point out the methodological flaw in this visualisation? ;-)