Re: [Corpora-List] American and British English spelling converter

From: Eric Atwell (eric@comp.leeds.ac.uk)
Date: Fri Nov 03 2006 - 11:15:38 MET

  • Next message: Harold Somers: "RE: [Corpora-List] American and British English spelling converter"

    It may not be obvious to CORPORA readers who don't know Martin Wynne,
    but this MUST have been a tongue-in-cheek comment! The underlying
    message is that the BNC provides empirical evidence that many traditional
    distinctions between US and UK English spelling and vocabulary are
    breaking down, as both US and UK traditional spellings are
    interchangeably accepted worldwide and even in Britain.
    I wonder if American corpora eg ANC have evidence of British spellings?

    I'm currently looking into which English dominates the World Wide Web:
    British or American? I've collected a small web-as-corpus from UK and US
    domains, to compare with other English web-as-corpus samples taken from about
    100 other national domains. Can anyone point me at other studies
    comparing/assessing uptake of British v American English on WWW
    outside UK and USA?

    thanks

    Eric Atwell, Leeds University

    On Thu, 2 Nov 2006, Martin Wynne wrote:

    > If you find such a program, let us know, and we can run it over the BNC and
    > change the 5849 occurrences of 'realize' and inflected forms to 'realise'
    > etc., and otherwise correct British English to your preferred spellings ;)
    >
    > Martin Krallinger wrote:
    >
    >> Dear all,
    >>
    >> I was looking for some simple tool (preferable in Python) which
    >> is able to do automatic conversion of texts (or words) from
    >> British English (UK) to American (US) English and vice versa.
    >> (Example: realize <-> realise)
    >>
    >> This seems to be an easy task, but I could not find any ready to use
    >> stand alone tool capable of performing this task.
    >>
    >> I want to integrate this application into an Information extraction system
    >> which handles scientific literature.
    >>
    >> I am also interested in references where aspects related to US/UK English
    >> spelling has been analyzed in the context of information extraction, text
    >> mining and terminology extraction.
    >>
    >> Best regards,
    >>
    >>
    >> Martin
    >>
    >>
    >
    >

    -- 
    Eric Atwell,
    Senior Lecturer, Language research group leader, School of Computing,
    Faculty of Engineering, University of Leeds, LEEDS LS2 9JT, England
    TEL: +44-113-3435430  FAX: +44-113-3435468  http://www.comp.leeds.ac.uk/eric
    



    This archive was generated by hypermail 2b29 : Fri Nov 03 2006 - 11:32:05 MET