Re: [Corpora-List] qu: collecting and automatically classifying data from the web

From: William Fletcher (fletcher@usna.edu)
Date: Tue Nov 15 2005 - 20:06:39 MET

  • Next message: Luis Sarmento: "[Corpora-List] REUTER corpus online?"

    Florian,

    My free KWiCFinder application (Windows)
    http://kwicfinder.com
    does support date ranges and permits restriction of searches to specific websites. On the other hand, it requires searching for specific words or phrases, and is hampered by the changes to the AltaVista search engine (no wildcards , inconsistent support for stopwords as well as capitals and diacritics. Webpages downloaded can be saved automatically in either text or HTML format for further analysis.

    For further details see my paper
    "Concordancing the Web: Promise and Problems, Tools and Techniques"
    http://www.kwicfinder.com/FletcherConcordancingWeb2005.pdf

    Good luck,
    Bill Fletcher

    >>> "T. Florian Jaeger" <tiflo@csli.stanford.edu> 11/15/05 12:45 PM >>>
    Hello,

    I am forwarding this for a friend who wants to collect data from
    specific web sites and automatically organize it according to the data
    of the website. Are you aware of any such tool? I remember there was a
    KWIC like search interface for the web, but I can't remember it's name
    and I also don't know whether it allows you specify date ranges for
    the search.

    thanks for your help,

    florian

    --
    T. Florian Jaeger
    Ph.D. student
    Linguistics Department,
    P: +1 (650) 725 2323
    F: +1 (650) 723 5666
    U: http://www.stanford.edu/~tiflo/ 
    



    This archive was generated by hypermail 2b29 : Tue Nov 15 2005 - 20:34:14 MET