Hello,
Is corpus extracted from a variety of blogs available online (for
academic use)?
I would like to tag texts in such corpus and perform stylistic analysis on it.
Alternatively, is there an API for this blog post text extraction task ?
The XML-RPC API for Waypath (http://www.waypath.com/apis/) looks good,
but seems that it doesn't return full text of posts and documentation
avail. is not very detailed.
In the absence of such corpus and APIs, I am thinking of doing this by
1] using RSS, ATOM feed parsers on some OPML files to get URLs for blog posts
2] Extracting the text (easier if the blog template format is known)
Thanks and Regards,
Trilok.
This archive was generated by hypermail 2b29 : Wed Mar 30 2005 - 13:52:53 MET DST