Re: Corpora: OnLine English phrase checker

Gregory Grefenstette (Gregory.Grefenstette@xrce.xerox.com)
Thu, 4 Nov 1999 14:30:04 +0100 (MET)

the OnLine English phrase checker, which is just a rerouting
to the web browser www.alltheweb.com provides nothing new.

This functionality of producing index counts
or page counts of phrases has been available
for years on Altavista.

Try
"wheels in motion" "gears in motion"

http://www.altavista.com/cgi-bin/query?pg=q&sc=on&q=%22wheels+in+motion%22+%22gears+in+motion%22&kl=XX&stype=stext&search.x=11&se
arch.y=9

It returns "3702 pages found"
...
word count: gears in motion: 232; wheels in motion: about 4000"

I used this functionality in bit of research that I am presenting
"The WWW as a Resource for Example-Based MT Tasks"
at the ASLIB "Translating and the Computer 21 Conference" next month:
http://www.aslib.co.uk/conferences/tc21.html

In the experiments described in this paper,
candidate translations of all transparent compounds
in bilingual dictionaries were generated, and using WWW frequencies,
such as those provided by Altavista and FAST.no, selecting the most frequently
occurring candidate translation among ambiguous translations
gives the right choice 86-87% of time when there is
more than one possible translation. (German-to-English,
Spanish-to-English).

This argues for considering the WWW as a new (free)
type of linguistic resources. With different
characteristics from a principlely built corpus
such as the BNC, but extremely rich as is.

This also argues for more linguistically informed
web browsers, with a little more smarts than just
string counters like altavista and fast.no

> From: "Jane A. Edwards" <edwards@ICSI.Berkeley.EDU>
> Date: Thu, 4 Nov 1999 01:47:25 -0800 (PST)
> To: corpora@hd.uib.no
> Subject: Corpora: OnLine English phrase checker
> Cc: edwards@ICSI.Berkeley.EDU
> Mime-Version: 1.0
>
> The following message actually reached me as spam, but
> I found the OnLine English phrase checker (the third URL below)
> to be useful enough that I thought it might be of interest to the list.
>
> I looked up "wheels in motion" and "gears in motion" to test it.
> For "wheels in motion" it said:
> 3002 documents found - 0.0310 seconds search time
> For "gears in motion" is found only about 1/10 that many:
> 201 documents found - 0.0140 seconds search time
> (Maybe that's why I'd never heard the second one before last week?)
>
> It gives you standard search-engine type listings, so you can click
> on the URL and see the phrase in context.
>
> It's not the BNC, of course, but it might be interesting for some purposes.
>
> Best Wishes,
>
> -Jane Edwards
>
> ---------------------------
> From ole2@oleng.com.au Thu Oct 28 05:30 PDT 1999
> Date: Thu, 28 Oct 1999 20:30:00 +1000
> From: OnLine English <ole2@oleng.com.au>
> Subject: Resources for publishing in English
>
> Announcing a new WWW site for OnLine English
>
> * a full outline of its editing service for academics and researchers
> (www.oleng.com.au)
>
> * writers' links for researchers and other professionals
> (http://www.oleng.com.au/indexwl.html)
>
> * the OnLine English phrase checker, a new way of using search engine
> technology to check any English phrase and better understand how English
> works in context (http://www.oleng.com.au/indexpc.html)
>
>
>

____________________________________________________________________________
Gregory Grefenstette, Principal Scientist
Xerox Research Centre Europe, 6 chemin de Maupertuis, 38240 Meylan, France
Gregory.Grefenstette@xrce.xerox.com
Phone : (33) 4 76 61 50 82 fax : (33) 4 76 61 50 99
Inside France: 04-76-61-50-82
http://www.xrce.xerox.com/people/grefenstette