[Corpora-List] ** New BNC interface: comparing synonyms and using WordNet info

From: Mark Davies (Mark_Davies@byu.edu)
Date: Tue Mar 01 2005 - 17:49:48 MET

  • Next message: John Mckenny: "[Corpora-List] how many formulaic sequences can you find? Responses to 2 comment s"

    The free "Variation in English Words and Phrases" website (http://view.byu.edu) now allows users to directly compare (with one simple query) semantically-related words and phrases in the 100 million word British National Corpus.
     

    ------------------------

    Using this interface, users can now input a query in which they specify 2-5 words that are being compared. Within 2-3 seconds, they then see separate, grouped lists of the collocates, distribution, and frequency for each of these words. Sample queries are the following:
     
    1) ANCHOR [sheer/complete/utter] TARGET [noun]
                All nouns occurring within three words of one of these four adjectives
                (e.g. sheer size, complete set, utter nonsense)
    2) ANCHOR [heard/saw/felt/touched] TARGET [noun]
                All nouns occurring within three words of one of these four verbs
                (e.g. saw/daylight, heard/rumours, felt/urge, touched/bases)
    3) The exact phrase: {destroy/ruin/demolish}the [n*]
                (e.g. destroy the evidence, ruin the appearance, demolish the church)

    Again, the purpose of this query is to see groupings of the most frequent collocates that occur with each of the contrasting words, or to see which collocates occur with one word but not with the others.

    ------------------------
     
    The BNC-based interface is also tied into the WordNet hierarchy of semantically-related words. This allows users to find synonyms, hyponyms (more general terms), hypernyms (more specific terms), and part/whole pairs, and then input these semantically-related words directly as part of the query. For example:
     
         Step 1: [=scream].[v*] yields the list of synonyms (verbs) for "scream"
         Step 2: Select the desired "synsets", which are then automatically inserted into the search form; e.g.:
              ANCHOR [scream/yell/shout/cry] TARGET [noun]
         Step 3: (results): scream murder, cry tears, shout help, yell instructions
     
    To see an expanded overview of these features, select one of the following two options from the drop-down list, once you are at the website:
     
    -- Word Comparisons (synonyms)
    -- Incorporating info from WordNet
     

    ------------------------

    The ability to search by semantically-related words is just one of the features of this interface. This interface also allows more basic queries of the BNC, involving exact words and phrases, wildcards, and POS-based queries (e.g. white [nn*]). In addition, users can carry out "fuzzy matches" on the BNC (nouns near "woman", adverbs near "feel", etc). It's also very fast - just a second or two for most queries.
     
    Finally, as has been mentioned in a previous posting, the interface also allows users to search directly by (and therefore) compare more than 70 registers. Users can create customized registers "on the fly" and combine these with other types of queries (wildcards, POS, etc). This allows queries like:
     
    4) Which adjectives occur more with [woman] in newspapers than in fiction?
    5) Which phrase with the pattern "we [verb] that" are more common in spoken than academic?
     
    In summary, this freely-available, web-based interface to the BNC allows both language learners and advanced users to quickly and easily carry out several types of queries that have previously been very difficult or impossible with any other interface of the BNC.
     
    =================================================
    Mark Davies
    Assoc. Prof., Linguistics
    Brigham Young University
    (phone) 801-422-9168 / (fax) 801-422-0906
    http://davies-linguistics.byu.edu

    ** Corpus design and use // Linguistic databases **
    ** Historical linguistics // Language variation **
    ** English, Spanish, and Portuguese **
    =================================================
     



    This archive was generated by hypermail 2b29 : Tue Mar 01 2005 - 17:56:14 MET