Re: [Corpora-List] Inverted index implementation: Best practices

From: Andy Roberts (andyr@comp.leeds.ac.uk)
Date: Sat Oct 15 2005 - 19:09:33 MET DST

Next message: Will Fitzgerald: "Re: [Corpora-List] Inverted index implementation: Best practices"

Previous message: Helge Thomas Karset Hellerud: "[Corpora-List] Inverted index implementation: Best practices"
In reply to: Helge Thomas Karset Hellerud: "[Corpora-List] Inverted index implementation: Best practices"
Next in thread: Will Fitzgerald: "Re: [Corpora-List] Inverted index implementation: Best practices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

The best practice is to not spend your valuable time and resources
re-implementing indexing/searching software. Many already exist and have
undergone years of testing and improvement.

For this task, I tend to go for Lucene, which is a Java library for
fast indexing and searching. It's really fast and is designed to cope
with gigabytes of data.

http://lucene.apache.org

With Lucene being an Apache project its well supported and receives a
lot of coverage. Many sub-projects have been formed to port Lucene into
other languages like C, Perl, Python and C#, which is very handy for
those who Java's not the language of choice.

Andy

On Sat, 15 Oct 2005, Helge Thomas Karset Hellerud wrote:

> Hello,
>
> Does anyone have some good links where I can find best practices when
> implementing an inverted index (inverted file index)? The index only
> needs to store terms and in which document they occur:
>
> term document
> --------------
> term1 1;3;5
> term2 1;2;4
> term3 3;4
> ...
>
> The goal of the implementation is to be able to do a fast search even if
> the index will become large.
>
> Thanks in advance.
>
> Helge
>
>
>
>

Next message: Will Fitzgerald: "Re: [Corpora-List] Inverted index implementation: Best practices"
Previous message: Helge Thomas Karset Hellerud: "[Corpora-List] Inverted index implementation: Best practices"
In reply to: Helge Thomas Karset Hellerud: "[Corpora-List] Inverted index implementation: Best practices"
Next in thread: Will Fitzgerald: "Re: [Corpora-List] Inverted index implementation: Best practices"
Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]

This archive was generated by hypermail 2b29 : Sat Oct 15 2005 - 19:26:44 MET DST