Re: [Corpora-List] American and British English spelling converter

From: Ben Hutchinson (ben.hutch@gmail.com)
Date: Fri Nov 03 2006 - 01:12:03 MET

Next message: Eric Atwell: "Re: [Corpora-List] American and British English spelling converter"

Previous message: Martin Wynne: "Re: [Corpora-List] American and British English spelling converter"
In reply to: Martin Wynne: "Re: [Corpora-List] American and British English spelling converter"
Next in thread: Eric Atwell: "Re: [Corpora-List] American and British English spelling converter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

Stanford University's NLP group's POS tagger does some pre-processing
that converts British spellings to US spellings based on variations in
the spellings of certain common words and word endings.

As an example of how it modifies word endings, it tags
"sour flour our dour parlour rigour glamour colour Harbour"
as
"sour/JJ flour/NN our/PRP$ dour/NN parlor/NN rigor/NN glamor/NN
color/NN Harbor/NNP".

It even Americanizes unknown words ending in "-our", so, for example,
it tags "nonsensour" as "nonsensor". Sometimes it is a bit over
eager, as in "devour" -> "devor/NN".

The tagger is under the GNU license, so I think it should be possible
to adapt the Java code to suit your requirements as long as you
resdistribute your changes. I also think it should be fairly
straightforward to invert their algorithm, although it's a while since
I looked at the source. It is available from
http://nlp.stanford.edu/software/index.shtml

On 03/11/06, Martin Wynne <martin.wynne@oucs.ox.ac.uk> wrote:
> If you find such a program, let us know, and we can run it over the BNC
> and change the 5849 occurrences of 'realize' and inflected forms to
> 'realise' etc., and otherwise correct British English to your preferred
> spellings ;)
>
> Martin Krallinger wrote:
>
> > Dear all,
> >
> > I was looking for some simple tool (preferable in Python) which
> > is able to do automatic conversion of texts (or words) from
> > British English (UK) to American (US) English and vice versa.
> > (Example: realize <-> realise)
> >
> > This seems to be an easy task, but I could not find any ready to use
> > stand alone tool capable of performing this task.
> >
> > I want to integrate this application into an Information extraction
> > system
> > which handles scientific literature.
> >
> > I am also interested in references where aspects related to US/UK English
> > spelling has been analyzed in the context of information extraction, text
> > mining and terminology extraction.
> >
> > Best regards,
> >
> >
> > Martin
> >
> >
>
>
>

Next message: Eric Atwell: "Re: [Corpora-List] American and British English spelling converter"
Previous message: Martin Wynne: "Re: [Corpora-List] American and British English spelling converter"
In reply to: Martin Wynne: "Re: [Corpora-List] American and British English spelling converter"
Next in thread: Eric Atwell: "Re: [Corpora-List] American and British English spelling converter"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Fri Nov 03 2006 - 01:10:10 MET