I had need to get some data from Amazon Web Services, and in particular their Alexa Top Sites web service which provides access to lists of web sites ordered by Alexa Traffic Rank.
And, happily, they had a Query Example in Ruby! (as well as Java, Perl, PHP, and C#)
It is a bit unclear about where to get the country code’s although I see that you can do a query to get the list using ResponseGroup=ListCountries ( I just cheated and used Alexa site to get the country code I wanted) pasted in the access key and double secret code key and fired it off…
Only it didn’t seem to work! WTF! the error said “The URI http://awis.amazonaws.com/onca/xml is not valid” but but i didn’t change that!!! Carefully reading, and remembering to breathe, the doc’s I noticed that it refereed to the base uri as being “http://ats.amazonaws.com” rather that what was in line 27 of the topsites.rb file : “http://awis.amazonaws.com/onca/xml” , so I tried that and it worked! I guess they changed some stuff and have not updated the sample code? sloppy!
The next issue was that the query only produces a max count of 100 and I wanted thousands! (The Alexa site already shows the top 100 by country.)
I quickly wrote up some ruby code to figure out my start count and generate a filename for each increment which was passed to a modified aws topsite query (changed to write to a file name rather than standard output i.e. the console), and many xml files later I’m done. (now to import the mess! – which proved to be easy to do in excel 2003)
begincount = 1
incr = 100
for x in 0..nol
start = begincount + (x * incr )
filename = “c://aws/aww_ts_” + x.to_s + “.xml”
QueryAWS_topSite(start, incr, filename)
Maybe I will mess with it some more to create one giant xml file (return the xml object and parse out the elements I want before writing to one file?) and otherwise more elegant, but for now it is “good enough”. and geeky fun too!