As part of a grand plan to be able to map a bunch of addresses onto a graphical map of sydney or australia, firstly you need all the australian postcodes and their longitude and latitude. Now in this article i describe how to screen scrape this list from google maps (please forgive me Mr Google!).
Firstly you'll need the list of all postcodes from Australia Post. You can get it
here - it is a zip of a CSV file. It contains a whole bunch of non-physical addresses such as PO boxes - if you're inclined, open it in Excel and remove these. What you want to end up with is a file called 'pc-full.csv' which we'll use later.
For the next step you'll need Curl installed. You can
get it here. If you're behind a firewall, there's some hints here on how to get Curl/Wget to
play nicely with firewalls (even though the instructions are for wget, it still applies to curl). Once you think you've got it sorted, try this from the command prompt:
curl "http://maps.google.com.au/maps?f=q&hl=en&geocode=&q=2768+australia&output=js"
You should get a printout of a whole bunch of javascript with 'center:{lat:xxx,lng:yyy}' in it somewhere. Sweet - you've got curl working.
Next up we want to write a script to do all the work. I've used ruby, because it's awesome, but you can use whatever you want. The following code opens up the 'pc-full.csv' file, reads the list of distinct postcodes, runs curl against each one, parses the longitude and latitude from each, and outputs a nice CSV file with 3 columns: Postcode, Latitude, Longitude.
Now i'll have to apologise for the sloppy code, but it does work (updated to
fix wordpress' curly quotes):
def IsGood(fname)
# Does a file contain the longitude eg did it connect to gmaps correctly?
# Ironically, in C# this code would be a one liner: File.ReadAllText(fname).Contains("center:{lat:");
r = "Missing"
if File.exist?(fname)
f = File.new(fname)
lines = f.read
f.close
if lines.include?("center:{lat:")
r = 'Good'
else
r = 'Bad'
end
end
r
end
# get the list of unique postcodes from the CSV file downloaded from australia post
@postcodes = []
File.new('pc-full.csv').readlines.each {|l|
x = l.gsub(/[\",]/,"").to_i
@postcodes 0 && !@postcodes.include?(x)
}
puts "Total #{@postcodes.length} unique postcodes"
# Scrape them all from google maps
@postcodes.each {|postcode|
fname = "data_#{postcode}.txt"
if !File.exist?(fname)
system "curl -o #{fname} \"http://maps.google.com.au/" +
"maps?f=q&hl=en&geocode=&q=#{"%04d" % postcode}+australia&output=js\""
puts "Any good? #{IsGood(fname)}"
end
}
# Go through the resultant files, parsing the longitude and latitude
@results = ["Postcode,Lat,Lng"]
@postcodes.each {|postcode|
fname = "data_#{postcode}.txt"
status = IsGood(fname)
puts "#{fname} : #{status}"
# Grab the long & lat
if status=='Good'
f = File.new(fname)
lines = f.read
f.close
m = /center:\{lat:([\-.0-9]*),lng:([\-.0-9]*)\}/.match(lines)
@results
Note: Put curl.exe in the same folder as your ruby file above, if it's not in the path.
And your output from running all that should be a 'PostcodeLatLng.csv' file with the contents something like this:
Postcode,Lat,Lng
2000,-33.869027000000003,151.21024499999999
2001,-37.808776999999999,144.94928899999999
2002,-25.335448,135.74507600000001
2004,-33.891787999999998,151.17625100000001
Bob's your uncle! You should now have the longitude and latitude of all australian postcodes. In the next article, i'll show how i use this to make a map of australian post offices overlaying a map of australia, for instance. Here: Geocoding part 2