Re: [Corpora-List] Inverted index implementation: Best practices

From: Andy Roberts (andyr@comp.leeds.ac.uk)
Date: Sat Oct 15 2005 - 19:09:33 MET DST

  • Next message: Will Fitzgerald: "Re: [Corpora-List] Inverted index implementation: Best practices"

    The best practice is to not spend your valuable time and resources
    re-implementing indexing/searching software. Many already exist and have
    undergone years of testing and improvement.

    For this task, I tend to go for Lucene, which is a Java library for
    fast indexing and searching. It's really fast and is designed to cope
    with gigabytes of data.

    http://lucene.apache.org

    With Lucene being an Apache project its well supported and receives a
    lot of coverage. Many sub-projects have been formed to port Lucene into
    other languages like C, Perl, Python and C#, which is very handy for
    those who Java's not the language of choice.

    Andy

    On Sat, 15 Oct 2005, Helge Thomas Karset Hellerud wrote:

    > Hello,
    >
    > Does anyone have some good links where I can find best practices when
    > implementing an inverted index (inverted file index)? The index only
    > needs to store terms and in which document they occur:
    >
    > term document
    > --------------
    > term1 1;3;5
    > term2 1;2;4
    > term3 3;4
    > ...
    >
    > The goal of the implementation is to be able to do a fast search even if
    > the index will become large.
    >
    > Thanks in advance.
    >
    > Helge
    >
    >
    >
    >



    This archive was generated by hypermail 2b29 : Sat Oct 15 2005 - 19:26:44 MET DST