How to Implement Lucene.NET

In this article, we are going to be learning how to implement Lucene.NET in C#. We will cover the entire process from installing the library, to indexing and searching.

To download the source code for this article, you can visit our GitHub repository.

Let’s begin!

What is Lucene.NET

Lucene.NET is a C# port of the original Java Lucene Search API. Lucene.NET is a library that provides robust index and search capabilities that allow us to create our own search engine. In addition to the incredibly fast indexing and searching functionality, Lucene.NET also supports a myriad of packages that add convenient features (i.e. spell check, auto-suggest, etc.). Coupling this with the feature-rich .NET framework allows us to create a modern search experience that is both easy to create and use.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!

Lucene.NET Installation

We can use Visual Studio’s built-in NuGet Package Manager to add the latest version of Lucene.NET to our project.

Once the package manager is open, we should select the “Include prerelease” checkbox. The pre-release version is the latest and most stable version currently. Then let’s search for Lucene.NET, and select the Lucene.Net by The Apache Software Foundation result.

We will also do the same thing for the Lucene.Net.Analysis.Common and Lucene.Net.QueryParser packages. If you want to add any of the other extended features, you can select and add any of the other Lucene.NET packages in the same way. Now let’s use what we just installed!

Configure An Index

Before we start, we strongly recommend downloading the source code for this article. It will be easier to follow along with the article’s content.

In order to configure an index with Lucene.NET, we are going to add a new index method with some configuration code:

const LuceneVersion lv = LuceneVersion.LUCENE_48;
Analyzer a = new StandardAnalyzer(lv);
Directory = new RAMDirectory();
var config = new IndexWriterConfig(lv, a);
Writer = new IndexWriter(Directory, config);

Because of the ongoing development status of the Lucene.NET project, we need to store the Lucene Version within an enum that our indexer and searcher will use. We then can use this constant when initializing the rest of the indexing and searching classes.

We use the stored Lucene version when initializing a new analyzer. That analyzer will parse, tokenize, and analyze the data we want indexed to store it in a format so we can quickly retrieve it later. There are a variety of analyzers depending on your need. For sake of time, demonstrating all the existing analyzers is outside the scope of this article. However, more information can be found here to help choose the analyzer to fit your specific use case.

We now must make a choice — as to where to store the index — between the RAM or the local filesystem. Storing the index in RAM will make search results lightning fast. However, we would add the increased overhead of having to re-index and store the results in memory every time we start the program. For a more permanent solution and also for larger datasets, you would need to use the filesystem to store the index. Because our example data-set is small, we will be storing our index in RAM.

After which, we create a config to store our IndexWriter settings using our existing stored version and directory.

Finally, we can initialize our IndexWriter to begin creating our index.

Create An Index

In order to build an index, we must first provide the data, and then we can use the Lucene.NET API to index. The data we can obtain from any data source such as a database or a file. For the purposes of this example, we will hard-code the data we want in a List containing a custom Person class:

public static void GetData()
{
    Data = new List<Person>()
    {
        new Person(Guid.NewGuid().ToString(),"Fred","George","Herb","A tall thin man."),
        ...
    };
}

For every entry that we wish to store in the index, we will make use of Lucene.NET built-in Document object. Within this Document we can populate the fields that we want to be stored. So, let’s add a few more code lines in the Index method:

var guidField = new StringField("GUID", "", Field.Store.YES);
var fNameField = new TextField("FirstName", "", Field.Store.YES);
var mNameField = new TextField("MiddleName", "", Field.Store.YES);
var lNameField = new TextField("LastName","",Field.Store.YES);
var descriptionField = new TextField("Description","",Field.Store.YES);

var d = new Document()
{
    guidField,
    fNameField,
    mNameField,
    lNameField,
    descriptionField
};

There are many different field types and options that we can use. More details can be found here from the Lucene.NET documentation.

The Lucene.NET Document creates a reference to each Field that has been added. As a result, we do not need to create a new Document each time, but can just modify the Field values while iterating through our list to decrease our performance cost:

foreach (Person person in Data)
{
    guidField.SetStringValue(person.GUID);
    fNameField.SetStringValue(person.FirstName);
    mNameField.SetStringValue(person.MiddleName);
    lNameField.SetStringValue(person.LastName);
    descriptionField.SetStringValue(person.Description);

    Writer.AddDocument(d);
}
Writer.Commit();

After the Field values have been updated, we can pass the Document to IndexWriter. After iterating through our list we call IndexWriter‘s Commit(), which commits all changes to the index.

Maintain An Index with Lucene.NET

Once we create the index, we can easily make additions, removals, or updates.

We will follow similar steps to add a new entry to the index as when we were creating an index:

PersonGuidToBeUpdated = Guid.NewGuid().ToString();

var d = new Document()
{
    new StringField("GUID", PersonGuidToBeUpdated, Field.Store.YES),
    new TextField("FirstName", "AddedFirstName", Field.Store.YES),
    new TextField("MiddleName", "AddedMiddleName", Field.Store.YES),
    new TextField("LastName", "AddedLastName", Field.Store.YES),
    new TextField("Description", "Added Description", Field.Store.YES)
};

Writer.AddDocument(d);
Writer.Commit();

Notice that we just create a Document with the required field info and add that Document to the index using Writer.AddDocument(d).

To update a record in the index, we need a unique identifier to know which entry to change. In our example data-set, we use a GUID that we can use to differentiate between each entry:

var d = new Document()
{
    new StringField("GUID", PersonGuidToBeUpdated, Field.Store.YES),
    new TextField("FirstName", "UpdateFirstName", Field.Store.YES),
    new TextField("MiddleName", "UpdatedMiddleName", Field.Store.YES),
    new TextField("LastName", "UpdatedLastName", Field.Store.YES),
    new TextField("Description", "Updated Description", Field.Store.YES)
};

Writer.UpdateDocument(new Term("GUID", PersonGuidToBeUpdated), d);
Writer.Commit();

We supply two parameters to Writer.UpdateDocument. The first is a new search Term, containing the Term‘s name and value, to match the entry we want to be changed. The second parameter contains our Document updated data.

To remove an entry from the index, we can use the API’s Writer.DeleteDocuments:

Writer.DeleteDocuments(new Term("GUID", personGuidToBeUpdated));

We pass the criteria for the entries we want to delete to this method via a single parameter. In our case, we only want to delete the entry matching the GUID we provide.

Don’t forget after we add, update, or delete from the index we need to call Writer.Commit() to actually reflect our changes in the index.

Perform a Search

Now that we have a working index, let’s run some searches against it!

We begin with the standard configuration of IndexSearcher that opens the existing RAMDirectory:

const LuceneVersion lv = LuceneVersion.LUCENE_48;
Analyzer a = new StandardAnalyzer(lv);
var dirReader = DirectoryReader.Open(Directory);
var searcher = new IndexSearcher(dirReader);

Remember that we must use the same analyzer and version of Lucene.NET that IndexWriter used to create the index!

Next, we use Lucene.NET’s MultiFieldQueryParser to turn the text we want to use for searching into a Query that Lucene.NET can use to quickly search against the index:

string[] fnames = { "GUID", "FirstName", "MiddleName", "LastName", "Age", "Description" };
var multiFieldQP = new MultiFieldQueryParser(lv, fnames, a);
Query query = multiFieldQP.Parse(input.Trim());

There are many different types of queries and query parsers that you can use or build yourself to provide a specific type of search (read more here). We are using the multi-field parser. It takes the current Lucene version, array of field names, and analyzer to create a query. We then can use that query to search against every field that we specify. If our search input contains multiple words, by default the type of search will be OR, but this can be changed to AND by setting the MultiFieldQueryParser‘s DefaultOperator.

Now we are ready to search using IndexSearcher‘s Search method:

ScoreDoc[] docs = searcher.Search(query, null, 1000).ScoreDocs;

The first parameter is obviously the query we just generated using the MultiFieldQueryParser, and the last parameter is the number of results to return. The middle parameter is if we wanted to include a custom Filter by which we could further restrict (based on custom criteria) what results are shown.

Additionally, the search criteria that the user provides can contain a wide range of operators that allow for very powerful searching (i.e. wildcard, proximity, range, etc.).

Use the Results

We completed the search, let’s see how we can use the results.

The IndexSearcher returns a ScoreDoc array that contains a float value representing the search score as well as the corresponding matching Document:

var results = new List<string>();
for (int i = 0; i < docs.Length; i++)
{
    Document d = searcher.Doc(docs[i].Doc);
    string guid = d.Get("GUID");
    string firstname = d.Get("FirstName");
    string middlename = d.Get("MiddleName");
    string lastname = d.Get("LastName");
    string description = d.Get("Description");

    results.Add($"{guid} {firstname} {middlename} {lastname} {description}");
}
 dirReader.Dispose();

We can simply iterate through the array and grab each document using the array index. In our example, we will store the Document values in a List<string>. Remember that only values that have been marked for storage during the indexing process (Field.Store.YES) will be able to be retrieved here for display:

Enter Search Criteria or Just Press Enter to End Program:
test

Enter Search Criteria or Just Press Enter to End Program:
fred
49e57f47-5050-4dc5-a40a-998e76cf04cc Fred George Herb A tall thin man.

Enter Search Criteria or Just Press Enter to End Program:
tall thin
49e57f47-5050-4dc5-a40a-998e76cf04cc Fred George Herb A tall thin man.
424cd1f5-bae6-4b7f-992b-023e7b7b304b Abigal Elizabeth Spear A tall thin woman.
706e18a2-7f2e-4736-865a-a7d354883fc5 Joe Rand Smith A very tall large man.
c0e94468-88b6-4acf-a05b-73cc620d5fd2 Deborah Jordan Davis A tall large woman.

After our search is completed we dispose of the IndexSearcher to free up resources. We also need to always remember to dispose of the IndexWriter and RAMDirectory when the program is done to avoid memory leaks:

Writer.Dispose();
Directory.Dispose();

In a production environment, we would want to use only one instance of IndexWriter and IndexReader per index until the application is closed.

Conclusion

This article is only scratching the surface of Lucene.NET’s capabilities. However, we hope that this will be a sufficient starting point on your journey to creating and using a powerful search engine in your future projects!

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!

How to Implement Lucene.NET

What is Lucene.NET

Lucene.NET Installation

Configure An Index

Create An Index

Maintain An Index with Lucene.NET

Perform a Search

Use the Results

Conclusion

Leave a reply Cancel reply

Courses – Code Maze

Ad 1

Ad 2

Ad 3

Ad 4