Try j-spider for crawling
http://j-spider.sourceforge.net/
But for HTML segmentation and extraction from HTML
documents you may want to look at the Wrapper work by
Stephen Soderland.
--- Chris Jordan <cjordan@cs.dal.ca> wrote:
> Hey Imen,
>
> Sounds like you are writing a crawler in Java. If so
> why reinvent the
> wheel? There are plenty of open source ones lying
> around.
>
> ismi.touati wrote:
>
> > Dear all,
> >
> > Does anyone know of :
> > - program to segment HTML documents (web
> pages),
> > - command java that can connect to a web page
> on the internet
> > having his URL.
> >
> > Thanks
> >
> > All the best
> >
> > Imen.
> >
> > //****************************//
> > Imen Touati
> > Master Student at Faculty of Economic Science and
> management of sfax,
> > Tunisia.
> > LARIS laboratory
> > Addresse : LARIS, FSEGS, BP 1088, 3018 Sfax,
> Tunisia
> > Tel : (216) 74 27 87 77
> > e-mail : ismi.touati@laposte.net
> <mailto:ismi.touati@laposte.net>
> >
> >
> > /Accédez au courrier électronique de La Poste :
> www.laposte.net ;/
> > /3615 LAPOSTENET (0,34 /mn) ; tél : 08 92 68 13 50
> (0,34/mn)/
>
>
>
__________________________________
Do you Yahoo!?
New and Improved Yahoo! Mail - 1GB free storage!
http://sg.whatsnew.mail.yahoo.com
This archive was generated by hypermail 2b29 : Tue Oct 25 2005 - 18:47:02 MET DST