[Corpora-List] Corpus from Blogs required.

From: Trilok Khairnar (trilokgk@gmail.com)
Date: Wed Mar 30 2005 - 13:51:21 MET DST

  • Next message: Daniela Kurz: "[Corpora-List] job submission"

    Hello,

    Is corpus extracted from a variety of blogs available online (for
    academic use)?
    I would like to tag texts in such corpus and perform stylistic analysis on it.

    Alternatively, is there an API for this blog post text extraction task ?
    The XML-RPC API for Waypath (http://www.waypath.com/apis/) looks good,
    but seems that it doesn't return full text of posts and documentation
    avail. is not very detailed.

    In the absence of such corpus and APIs, I am thinking of doing this by
    1] using RSS, ATOM feed parsers on some OPML files to get URLs for blog posts
    2] Extracting the text (easier if the blog template format is known)

    Thanks and Regards,
    Trilok.



    This archive was generated by hypermail 2b29 : Wed Mar 30 2005 - 13:52:53 MET DST