In this article, we are going to be learning how to implement Lucene.NET in C#. We will cover the entire process from installing the library, to indexing and searching.
Let’s begin!
What is Lucene.NET
Lucene.NET is a C# port of the original Java Lucene Search API. Lucene.NET is a library that provides robust index and search capabilities that allow us to create our own search engine. In addition to the incredibly fast indexing and searching functionality, Lucene.NET also supports a myriad of packages that add convenient features (i.e. spell check, auto-suggest, etc.). Coupling this with the feature-rich .NET framework allows us to create a modern search experience that is both easy to create and use.
Lucene.NET Installation
We can use Visual Studio’s built-in NuGet Package Manager to add the latest version of Lucene.NET to our project.
Once the package manager is open, we should select the “Include prerelease” checkbox. The pre-release version is the latest and most stable version currently. Then let’s search for Lucene.NET, and select the Lucene.Net by The Apache Software Foundation result.
We will also do the same thing for the Lucene.Net.Analysis.Common and Lucene.Net.QueryParser packages. If you want to add any of the other extended features, you can select and add any of the other Lucene.NET packages in the same way. Now let’s use what we just installed!
Configure An Index
Before we start, we strongly recommend downloading the source code for this article. It will be easier to follow along with the article’s content.
In order to configure an index with Lucene.NET, we are going to add a new index method with some configuration code:
const LuceneVersion lv = LuceneVersion.LUCENE_48; Analyzer a = new StandardAnalyzer(lv); Directory = new RAMDirectory(); var config = new IndexWriterConfig(lv, a); Writer = new IndexWriter(Directory, config);
Because of the ongoing development status of the Lucene.NET project, we need to store the Lucene Version within an enum
that our indexer and searcher will use. We then can use this constant when initializing the rest of the indexing and searching classes.
We use the stored Lucene version when initializing a new analyzer. That analyzer will parse, tokenize, and analyze the data we want indexed to store it in a format so we can quickly retrieve it later. There are a variety of analyzers depending on your need. For sake of time, demonstrating all the existing analyzers is outside the scope of this article. However, more information can be found here to help choose the analyzer to fit your specific use case.
We now must make a choice — as to where to store the index — between the RAM or the local filesystem. Storing the index in RAM will make search results lightning fast. However, we would add the increased overhead of having to re-index and store the results in memory every time we start the program. For a more permanent solution and also for larger datasets, you would need to use the filesystem to store the index. Because our example data-set is small, we will be storing our index in RAM.
After which, we create a config to store our IndexWriter
settings using our existing stored version and directory.
Finally, we can initialize our IndexWriter
to begin creating our index.
Create An Index
In order to build an index, we must first provide the data, and then we can use the Lucene.NET API to index. The data we can obtain from any data source such as a database or a file. For the purposes of this example, we will hard-code the data we want in a List
containing a custom Person
class:
public static void GetData() { Data = new List<Person>() { new Person(Guid.NewGuid().ToString(),"Fred","George","Herb","A tall thin man."), ... }; }
For every entry that we wish to store in the index, we will make use of Lucene.NET built-in Document
object. Within this Document
we can populate the fields that we want to be stored. So, let’s add a few more code lines in the Index method:
var guidField = new StringField("GUID", "", Field.Store.YES); var fNameField = new TextField("FirstName", "", Field.Store.YES); var mNameField = new TextField("MiddleName", "", Field.Store.YES); var lNameField = new TextField("LastName","",Field.Store.YES); var descriptionField = new TextField("Description","",Field.Store.YES); var d = new Document() { guidField, fNameField, mNameField, lNameField, descriptionField };
There are many different field types and options that we can use. More details can be found here from the Lucene.NET documentation.
The Lucene.NET Document
creates a reference to each Field
that has been added. As a result, we do not need to create a new Document
each time, but can just modify the Field
values while iterating through our list to decrease our performance cost:
foreach (Person person in Data) { guidField.SetStringValue(person.GUID); fNameField.SetStringValue(person.FirstName); mNameField.SetStringValue(person.MiddleName); lNameField.SetStringValue(person.LastName); descriptionField.SetStringValue(person.Description); Writer.AddDocument(d); } Writer.Commit();
After the Field
values have been updated, we can pass the Document
to IndexWriter
. After iterating through our list we call IndexWriter
‘s Commit()
, which commits all changes to the index.
Maintain An Index with Lucene.NET
Once we create the index, we can easily make additions, removals, or updates.
We will follow similar steps to add a new entry to the index as when we were creating an index:
PersonGuidToBeUpdated = Guid.NewGuid().ToString(); var d = new Document() { new StringField("GUID", PersonGuidToBeUpdated, Field.Store.YES), new TextField("FirstName", "AddedFirstName", Field.Store.YES), new TextField("MiddleName", "AddedMiddleName", Field.Store.YES), new TextField("LastName", "AddedLastName", Field.Store.YES), new TextField("Description", "Added Description", Field.Store.YES) }; Writer.AddDocument(d); Writer.Commit();
Notice that we just create a Document
with the required field info and add that Document
to the index using Writer.AddDocument(d)
.
To update a record in the index, we need a unique identifier to know which entry to change. In our example data-set, we use a GUID that we can use to differentiate between each entry:
var d = new Document() { new StringField("GUID", PersonGuidToBeUpdated, Field.Store.YES), new TextField("FirstName", "UpdateFirstName", Field.Store.YES), new TextField("MiddleName", "UpdatedMiddleName", Field.Store.YES), new TextField("LastName", "UpdatedLastName", Field.Store.YES), new TextField("Description", "Updated Description", Field.Store.YES) }; Writer.UpdateDocument(new Term("GUID", PersonGuidToBeUpdated), d); Writer.Commit();
We supply two parameters to Writer.UpdateDocument
. The first is a new search Term
, containing the Term
‘s name and value, to match the entry we want to be changed. The second parameter contains our Document
updated data.
To remove an entry from the index, we can use the API’s Writer.DeleteDocuments
:
Writer.DeleteDocuments(new Term("GUID", personGuidToBeUpdated));
We pass the criteria for the entries we want to delete to this method via a single parameter. In our case, we only want to delete the entry matching the GUID we provide.
Don’t forget after we add, update, or delete from the index we need to call Writer.Commit()
to actually reflect our changes in the index.
Perform a Search
Now that we have a working index, let’s run some searches against it!
We begin with the standard configuration of IndexSearcher
that opens the existing RAMDirectory
:
const LuceneVersion lv = LuceneVersion.LUCENE_48; Analyzer a = new StandardAnalyzer(lv); var dirReader = DirectoryReader.Open(Directory); var searcher = new IndexSearcher(dirReader);
Remember that we must use the same analyzer and version of Lucene.NET that IndexWriter
used to create the index!
Next, we use Lucene.NET’s MultiFieldQueryParser
to turn the text we want to use for searching into a Query
that Lucene.NET can use to quickly search against the index:
string[] fnames = { "GUID", "FirstName", "MiddleName", "LastName", "Age", "Description" }; var multiFieldQP = new MultiFieldQueryParser(lv, fnames, a); Query query = multiFieldQP.Parse(input.Trim());
There are many different types of queries and query parsers that you can use or build yourself to provide a specific type of search (read more here). We are using the multi-field parser. It takes the current Lucene version, array of field names, and analyzer to create a query. We then can use that query to search against every field that we specify. If our search input contains multiple words, by default the type of search will be OR
, but this can be changed to AND
by setting the MultiFieldQueryParser
‘s DefaultOperator
.
Now we are ready to search using IndexSearcher
‘s Search
method:
ScoreDoc[] docs = searcher.Search(query, null, 1000).ScoreDocs;
The first parameter is obviously the query we just generated using the MultiFieldQueryParser
, and the last parameter is the number of results to return. The middle parameter is if we wanted to include a custom Filter
by which we could further restrict (based on custom criteria) what results are shown.
Additionally, the search criteria that the user provides can contain a wide range of operators that allow for very powerful searching (i.e. wildcard, proximity, range, etc.).
Use the Results
We completed the search, let’s see how we can use the results.
The IndexSearcher
returns a ScoreDoc
array that contains a float value representing the search score as well as the corresponding matching Document
:
var results = new List<string>(); for (int i = 0; i < docs.Length; i++) { Document d = searcher.Doc(docs[i].Doc); string guid = d.Get("GUID"); string firstname = d.Get("FirstName"); string middlename = d.Get("MiddleName"); string lastname = d.Get("LastName"); string description = d.Get("Description"); results.Add($"{guid} {firstname} {middlename} {lastname} {description}"); } dirReader.Dispose();
We can simply iterate through the array and grab each document using the array index. In our example, we will store the Document
values in a List<string>
. Remember that only values that have been marked for storage during the indexing process (Field.Store.YES
) will be able to be retrieved here for display:
Enter Search Criteria or Just Press Enter to End Program: test Enter Search Criteria or Just Press Enter to End Program: fred 49e57f47-5050-4dc5-a40a-998e76cf04cc Fred George Herb A tall thin man. Enter Search Criteria or Just Press Enter to End Program: tall thin 49e57f47-5050-4dc5-a40a-998e76cf04cc Fred George Herb A tall thin man. 424cd1f5-bae6-4b7f-992b-023e7b7b304b Abigal Elizabeth Spear A tall thin woman. 706e18a2-7f2e-4736-865a-a7d354883fc5 Joe Rand Smith A very tall large man. c0e94468-88b6-4acf-a05b-73cc620d5fd2 Deborah Jordan Davis A tall large woman.
After our search is completed we dispose of the IndexSearcher
to free up resources. We also need to always remember to dispose of the IndexWriter
and RAMDirectory
when the program is done to avoid memory leaks:
Writer.Dispose(); Directory.Dispose();
In a production environment, we would want to use only one instance of IndexWriter
and IndexReader
per index until the application is closed.
Conclusion
This article is only scratching the surface of Lucene.NET’s capabilities. However, we hope that this will be a sufficient starting point on your journey to creating and using a powerful search engine in your future projects!