Re: [Corpora-List] To segment HTML document?

From: Delip Rao (deliprao@yahoo.com)
Date: Tue Oct 25 2005 - 18:34:52 MET DST

  • Next message: Kevin Duh: "[Corpora-List] Student Research Workshop at COLING/ACL-06: CFP"

    Try j-spider for crawling
    http://j-spider.sourceforge.net/

    But for HTML segmentation and extraction from HTML
    documents you may want to look at the Wrapper work by
    Stephen Soderland.

    --- Chris Jordan <cjordan@cs.dal.ca> wrote:

    > Hey Imen,
    >
    > Sounds like you are writing a crawler in Java. If so
    > why reinvent the
    > wheel? There are plenty of open source ones lying
    > around.
    >
    > ismi.touati wrote:
    >
    > > Dear all,
    > >
    > > Does anyone know of :
    > > - program to segment HTML documents (web
    > pages),
    > > - command java that can connect to a web page
    > on the internet
    > > having his URL.
    > >
    > > Thanks
    > >
    > > All the best
    > >
    > > Imen.
    > >
    > > //****************************//
    > > Imen Touati
    > > Master Student at Faculty of Economic Science and
    > management of sfax,
    > > Tunisia.
    > > LARIS laboratory
    > > Addresse : LARIS, FSEGS, BP 1088, 3018 Sfax,
    > Tunisia
    > > Tel : (216) 74 27 87 77
    > > e-mail : ismi.touati@laposte.net
    > <mailto:ismi.touati@laposte.net>
    > >
    > >
    > > /Accédez au courrier électronique de La Poste :
    > www.laposte.net ;/
    > > /3615 LAPOSTENET (0,34 /mn) ; tél : 08 92 68 13 50
    > (0,34/mn)/
    >
    >
    >

            
            
                    
    __________________________________
    Do you Yahoo!?
    New and Improved Yahoo! Mail - 1GB free storage!
    http://sg.whatsnew.mail.yahoo.com



    This archive was generated by hypermail 2b29 : Tue Oct 25 2005 - 18:47:02 MET DST