As part of a grand plan to be able to map a bunch of addresses onto a graphical map of sydney or australia, firstly you need all the australian postcodes and their longitude and latitude. Now in this article i describe how to screen scrape this list from google maps (please forgive me Mr Google!).

Firstly you'll need the list of all postcodes from Australia Post. You can get it here - it is a zip of a CSV file. It contains a whole bunch of non-physical addresses such as PO boxes - if you're inclined, open it in Excel and remove these. What you want to end up with is a file called 'pc-full.csv' which we'll use later.

For the next step you'll need Curl installed. You can get it here. If you're behind a firewall, there's some hints here on how to get Curl/Wget to play nicely with firewalls (even though the instructions are for wget, it still applies to curl). Once you think you've got it sorted, try this from the command prompt:

curl 'http://maps.google.com.au/maps?f=q&hl=en&geocode=&q=2768+australia&output=js'

You should get a printout of a whole bunch of javascript with 'center:{lat:xxx,lng:yyy}' in it somewhere. Sweet - you've got curl working.

Next up we want to write a script to do all the work. I've used ruby, because it's awesome, but you can use whatever you want. The following code opens up the 'pc-full.csv' file, reads the list of distinct postcodes, runs curl against each one, parses the longitude and latitude from each, and outputs a nice CSV file with 3 columns: Postcode, Latitude, Longitude.

def IsGood(fname)
  # Does a file contain the longitude eg did it connect to gmaps correctly?
  r = 'Missing'
  if File.exist?(fname)
    f = File.new(fname)
    lines = f.read
    f.close
    if lines.include?('center:{lat:')
      r = 'Good'
    else
      r = 'Bad'
    end
  end
  r
end
 
# get the list of unique postcodes from the CSV file downloaded from australia post
@postcodes = []
File.new('pc-full.csv').readlines.each {|l|
  x = l.gsub(/[\',]/,'').to_i
 @postcodes 0 && [email protected]?(x)
}
puts 'Total #{@postcodes.length} unique postcodes'
 
# Scrape them all from google maps
@postcodes.each {|postcode|
  fname = 'data_#{postcode}.txt'
  if !File.exist?(fname)
   system 'curl -o #{fname} \'http://maps.google.com.au/' + 
     'maps?f=q&hl=en&geocode=&q=#{'%04d' % postcode}+australia&output=js\''
   puts 'Any good? #{IsGood(fname)}'
  end
}
 
# Go through the resultant files, parsing the longitude and latitude
@results = ['Postcode,Lat,Lng']
@postcodes.each {|postcode|
  fname = 'data_#{postcode}.txt'
 
  status = IsGood(fname)
  puts '#{fname} : #{status}'
 
  # Grab the long & lat
  if status=='Good'
    f = File.new(fname)
    lines = f.read
    f.close
    m = /center:\{lat:([\-.0-9]*),lng:([\-.0-9]*)\}/.match(lines)
    @results << "#{postcode},#{m[1]},#{m[2]}"
  end
}

# Write it out to a CSV file
File.open("PostcodeLatLng.csv","w") {|f|
  @results.each {|line|
    f.write line
    f.write "\n"
  }
}

Note: Put curl.exe in the same folder as your ruby file above, if it's not in the path. And your output from running all that should be a 'PostcodeLatLng.csv' file with the contents something like this:

Postcode,Lat,Lng
2000,-33.869027000000003,151.21024499999999
2001,-37.808776999999999,144.94928899999999
2002,-25.335448,135.74507600000001
2004,-33.891787999999998,151.17625100000001

Bob's your uncle! You should now have the longitude and latitude of all australian postcodes. In the next article, i'll show how i use this to make a map of australian post offices overlaying a map of australia, for instance.

Thanks for reading! And if you want to get in touch, I'd love to hear from you: chris.hulbert at gmail.

Chris Hulbert

(Comp Sci, Hons - UTS)

Software Developer (Freelancer / Contractor) in Australia.

I have worked at places such as Google, Cochlear, Assembly Payments, News Corp, Fox Sports, NineMSN, FetchTV, Coles, Woolworths, Trust Bank, and Westpac, among others. If you're looking for help developing an iOS app, drop me a line!

Get in touch:
[email protected]
github.com/chrishulbert
linkedin



 Subscribe via RSS