> Can anyone point me to these corpora?
The main search widget on AltaVista now allows you to select the
language of the retrieved documents. Afrikaans is not on the menu,
but Romanian and Icelandic are. I selected Icelandic and entered a
term that's likely to appear in alot of WWW documents, namely `www*`.
I got just under 14000 hits. It shouldn't be that hard to write a
java script to collect them. Or else, you might want to just browse
the pointers by hand to find a few sufficiently long documents.
Happy hunting.
I. Dan Melamed melamed@linc.cis.upenn.edu
University of Pennsylvania http://www.cis.upenn.edu/~melamed/