The blog content was obtained by scraping the RSS feeds of blogs listed in the Gongol directory (plus a few others that I read that weren't listed therein). Because I scraped the RSS feeds, I am only comparing blogs by the content of their most recent post (and not the total history of posts). The fact that this clustering is obtained using only the most recent blog posts means that you can't really interpret much from the dendrogram. However if I built a database to store blog content and then re-ran this analysis everyday for a lengthy period of time you would hope that the clustering would settle down and become stable.
There are many ways that this methodology could be improved upon and if I have some time I may pursue them a bit further...
I have posted a .jpeg...unfortunately it is unreadable. You will need to download it and open it in a viewer if you want to see where your favorite economics blog ended up. I ended up close to Greg Mankiw...which was unexpected!

No comments:
Post a Comment