[Corpora-List] announcement: penn-helsinki parsed corpus of early modern english

From: Beatrice Santorini (beatrice@babel.ling.upenn.edu)
Date: Sun Mar 06 2005 - 00:08:05 MET

  • Next message: John Mckenny: "[Corpora-List] Query about nomenclature"

    We are happy to announce the release of the Penn-Helsinki Parsed Corpus
    of Early Modern English (PPCEME). The construction of the corpus was
    funded by the National Endowment for the Humanities (Grant # PA
    23382-99) and the National Science Foundation (Grant # BCS 99-05488).
    The Principal Investigator on these grants was Anthony Kroch, Professor
    of Linguistics, University of Pennsylvania and the research associate
    primarily responsible for corpus construction was Dr. Beatrice
    Santorini.

    The PPCEME contains 1.8 million words of running text, annotated for
    part of speech and sentence structure. It includes a parsed version of
    the entire Early Modern English section of the Helsinki Corpus of
    Historical English (600,000 words) and two equal-sized extensions of
    the Helsinki samples. Where the Helsinki texts were not sufficiently
    large to permit such extensions, new texts of similar genre and date
    were substituted, thereby preserving the sociolinguistic
    characteristics of the Helsinki corpus to the greatest extent possible.

    The new corpus will be distributed along with the existing PPCME2, the
    Penn-Helsinki Parsed Corpus of Middle English, which has been somewhat
    updated for the new release, under the same conditions of use. The two
    corpora share the same annotation system and the release CD contains a
    new version of the annotation manual, which has been revised to explain
    the annotation system more fully and now contains an extensive index.
    The new manual also explains the small number of differences in the
    annotation schemes of the PPCEME and the PPCME2. Information on
    obtaining the release CD is available at:

            http://www.ling.upenn.edu/hist-corpora

    The search program CorpusSearch that accompanies our corpora has been
    entirely redesigned and reprogrammed by its author, Beth Randall. The
    new version of the program, CorpusSearch 2, has been released as open
    source software on the Sourceforge web site. It is included on the
    release CD, and the latest version of the program will always be
    downloadable from Sourceforge at the URL:

            http://corpussearch.sourceforge.net

    This web site also contains the Users Guide and a facility for
    reporting bugs, as well as the program's source code.

    The PPCME2 and PPCEME, along with CorpusSearch 2, will only be
    distributed as a single distribution CD, at a cost of US$300. However,
    anyone with a license for the PPCME2 can purchase a license for the new
    corpus at a cost of US$50. The update will include the new version of
    the PPCME2 and all of the other updates described above.

    The Penn historical corpora are part of a larger project to produce
    parsed corpora of historical English. The other participants, Anthony
    Warner, Susan Pintzuk, and Ann Taylor at the University of York, have
    released the York-Toronto-Helsinki Parsed Corpus of Old English Prose.
    Please see their web site for details:

            http://www-users.york.ac.uk/~lang22/YcoeHome1.htm

    Additional corpora currently under construction at Penn and York
    include:

            The Penn Parsed Corpus of Modern British English
            The York-Helsinki Parsed Corpus of Early English Correspondence



    This archive was generated by hypermail 2b29 : Sun Mar 06 2005 - 00:19:18 MET