RE: [Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter

From: Merle Tenney (merlet@microsoft.com)
Date: Thu Nov 09 2006 - 21:50:21 MET

  • Next message: Christian Wolff: "[Corpora-List] CfP LDV-Forum Vol. 22 Nr. 1"

    I'm sorry, Mark. I think you may have misunderstood the thrust of my post. I certainly didn't mean to lecture, and I am not working on any "Ultimate True Theory of English

    Dialectology". When you questioned the need for a parallel corpus, I wondered if you might have some insight into how to get some of the benefits of parallel corpora without actually having parallel corpora. That is not a straw man; I think that is a worthwhile pursuit and probably tractable given the right approach and the right tools. It would lead to powerful insights and powerful tools in lexicology, dialectology, translation, second language acquisition, and much more. I would genuinely love to know if anyone has been able to achieve parallel corpus results with comparable corpus analysis techniques. (I must confess, Mark, that I am not on the nominating committee for the Nobel Prize in Corpus Linguistics, so that offer was made in jest. J )

    I'm glad that you don't make typos. I used to think that I didn't either, until I started using Word's new contextual speller. Some still get past me, for sure, but definitely fewer than before.

    Later, my friend. I've got to get back to my toys. J

    Merle

    -----Original Message-----
    From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On Behalf Of Mark P. Line
    Sent: Thursday, November 9, 2006 11:23 AM
    To: Merle Tenney
    Cc: CORPORA@UIB.NO
    Subject: Re: [Corpora-List] Parallel corpora and word alignment, WAS: American and British English spelling converter

    Merle Tenney wrote:

    > Ramesh Krishnamurthy wrote:

    >

    >> ...and there is no obvious parallel corpus of Br-Am Eng to consult...

    >

    >> Do you know of one by any chance...

    >

    >> And Mark P. Line responded:

    >

    >>Why would it have to be a *parallel* corpus?

    >

    > [Merle's lecture snipped]

    >

    > Mark, if you can figure out a way to combine the quality and quantity of

    > data from a very large corpus with the alignment and equivalence power of

    > a parallel corpus without actually having a parallel corpus, I will

    > personally nominate you for the Nobel Prize in Corpus Linguistics.

    I was speaking in the context of the mostly anecdotal claims being made on

    the parent thread, as to what it would take in the way of corpus

    examination to support or defeat them. I thought this was the context in

    which Ramesh was speaking, and I'm pretty sure it was the context of the

    initial Oxfordian question of why nobody on this thread had been making

    use of corpora.

    I was not speaking in the context of the Ultimate True Theory of English

    Dialectology, which would seem to be a strawman of your device.

    So in case the fault was mine and I was unclear in my question to Ramesh,

    please allow me to rephrase it: "Why would you need a *parallel* corpus to

    make or refute claims of the kind we've been seeing on the parent thread?"

    As nearly as I could tell, you didn't actually address that question in

    your lecture.

    > PS and Shameless Microsoft Plug: In the last paragraph, I accidentally

    > typed "figure out a why to combine" and I got the blue squiggle from Word

    > 2007, which was released to manufacturing on Monday of this week. It

    > suggested way, and of course I took the suggestion. I am amazed at the

    > number of mistakes that the contextual speller has caught in my writing

    > since I started using it. I recommend the new version of Word and Office

    > for this feature alone.

    Thanks, but I think that would entail my switching to a toy operating

    system as well as spending money on another piece of software, and I don't

    make enough typographical mistakes to warrant such a drastic measure.

    -- Mark

    Mark P. Line

    Polymathix

    San Antonio, TX



    This archive was generated by hypermail 2b29 : Thu Nov 09 2006 - 21:48:25 MET