RE: [Corpora-List] a new member

From: Christopher Brewster (C.Brewster@dcs.shef.ac.uk)
Date: Sat Feb 26 2005 - 10:25:14 MET

  • Next message: Mcenery, Tony: "[Corpora-List] DRH 2005 - CFP"

    I suggest you look at the METER project which concerned the re-use of News
    Feeds in journalism:
    http://nlp.shef.ac.uk/research/areas/reuse.html

    Although not directly connected with repetition in your sense , it may help
    you understand the techniques needed.

    Christopher Brewster

    *****************************************************
    Natural Language Processing Group,
    Department of Computer Science, University of Sheffield
    Tel: +44(0)114-22.21967 Fax: +44 (0)114-22.21810
    Regent Court, 211 Portobello Street
    Sheffield S1 4DP UNITED KINGDOM
    Web: http://www.dcs.shef.ac.uk/~kiffer/
    *****************************************************
    A definition is the enclosing a wilderness of an
    idea within a wall of words.--- Samuel Butler

     

      

    > -----Original Message-----
    > From: owner-corpora@lists.uib.no
    > [mailto:owner-corpora@lists.uib.no] On Behalf Of Mai Zaki
    > Sent: 26 February 2005 04:24
    > To: corpora@uib.no
    > Subject: [Corpora-List] a new member
    >
    > Hello everyone,
    >
    > It is a pleasure to join your group.
    >
    > I am a PhD student at Middlesex University and I am just
    > starting my research to put together a formal proposal. My
    > aim is to do a corpus-based study of repetition, comparing
    > the various fiction and non-fiction, written and spoken text
    > categories all within the framework of Relevance Theory. I am
    > kind of a beginner in this field of corpus linguistics. I
    > just did a small scale corpus-based study of the modals in my
    > MA thesis using a corpus I compiled myself and a concordance
    > software. Now I am hoping I can use one of the big English
    > corpora like the ICE-GB or the BNC. But I am basically
    > worried about the range of examples a one-million word corpus
    > or a 2000-word text collections corpus would generate. I was
    > also wondering if it would be feasible for such a study just
    > to go through the whole corpus looking for repeated words or
    > phrases since no search tool would be particularly useful,
    > and whether the layout of the data in either corpora would
    > allow me to detect cases of repetition both on senence and
    > discourse levels easily. I would really appreciate it if
    > anyone could provide me with useful information in this
    > regard, especially from those who actually worked with these
    > corpora before. And if anyone can recommend other corpora
    > for such a study would be most welcomed.
    >
    > Thank you all.
    >
    > Mai Zaki
    >
    >



    This archive was generated by hypermail 2b29 : Sat Feb 26 2005 - 10:51:26 MET