Hi Clive De Silva:
This doesn’t quite fit the bill, but if you don’t mind an
international corpus, UC Berkeley has a computed the DFs of words on the
Stanford WebBase corpus. See
My group has been using it for a number of different projects that require
Min-Yen KAN
Assistant Professor
Department of Computer Science, School of Computing
National University of Singapore, Singapore 117543
Office: S15-05-05
Tel: ++ (65) 6874-1885
Fax: ++ (65) 6779-4580
-----Original Message-----
From: owner-corpora@lists.uib.no [mailto:owner-corpora@lists.uib.no] On
Behalf Of Clive De Silva
Sent: Wednesday, May 12, 2004 4:24 PM
Subject: [Corpora-List] IDF values
Hi all.
I need to get IDF values for an American corpus of at least 100MW words. I
have access to TREC4 and TREC5 corpus but would prefer to not have to
extract the information 'manually' and was wondering if there are IDF values
out there already calculated from a large corpus. If not, are there any
tools for extracting IDFs efficiently?
Clive De Silva
MPhil student at the Computing Lab
University of Cambridge, UK
This archive was generated by hypermail 2b29 : Wed May 12 2004 - 10:39:03 MET DST