Beyond Microfoundations:: Content Clustering in Econ Blogs...

Sunday, January 9, 2011

Content Clustering in Econ Blogs...

Below is a dendrogram that I created using after applying a hierarchical clustering algorithm to the recent content of a set of economics blogs. I used average linkage hierarchical clustering to compare blogs based on the correlation of the vectors of word counts used in their most recent blog posts (i.e., blogs that used the same words lots of times will be listed as "similar" with this implementation).

The blog content was obtained by scraping the RSS feeds of blogs listed in the Gongol directory (plus a few others that I read that weren't listed therein). Because I scraped the RSS feeds, I am only comparing blogs by the content of their most recent post (and not the total history of posts). The fact that this clustering is obtained using only the most recent blog posts means that you can't really interpret much from the dendrogram. However if I built a database to store blog content and then re-ran this analysis everyday for a lengthy period of time you would hope that the clustering would settle down and become stable.

There are many ways that this methodology could be improved upon and if I have some time I may pursue them a bit further...

I have posted a .jpeg...unfortunately it is unreadable. You will need to download it and open it in a viewer if you want to see where your favorite economics blog ended up. I ended up close to Greg Mankiw...which was unexpected!

Beyond Microfoundations:

Blog Topics...

Sunday, January 9, 2011

Content Clustering in Econ Blogs...

No comments:

Post a Comment