In software development, we constantly have to handle different data formats. The XML standard is one of the most used document formats in this context. In this article, we will learn how to use XPath to select data from an XML document, in C#.

To download the source code for this article, you can visit our GitHub repository.

So, let’s start.

XML Overview

XML (eXtensible Markup Language), as the name suggests, is a markup language. It uses a hierarchical organization to describe and store data.

Support Code Maze on Patreon to get rid of ads and get the best discounts on our products!
Become a patron at Patreon!

Another characteristic of the XML language is that it doesn’t have predefined tags and the users create their own. The number of tags is also unlimited. In this way, XML is flexible and suitable for describing any kind of information.

XML Syntax

The XML document has a hierarchical model composed of one root element, the higher-level element, and its branches.

We can define an element as everything between (and including) an opening tag (<tagName>) and its respective closing tag (</tagName>), being that each of these building blocks can contain text, attributes, or even other nested elements:

<?xml version="1.0" encoding="utf-8" ?>
<catalog>
  <book id="1">
    <author>King, Stephen</author>
    <title>IT</title>
    <genre>Horror</genre>
    <price>40.00</price>
  </book>
  <book id="2">
    <author>Assis, Machado De</author>
    <title>Dom Casmurro</title>
    <genre>Romance</genre>
    <price>50.00</price>
  </book>
  <book id="3">
    <author>Calaprice, Alice; Lipscombe, Trevor</author>
    <title>Albert Einstein: A Biography</title>
    <genre>Biography</genre>
    <price>30.00</price>
  </book>
  <book id="4" xmlns="urn:example-schema">
    <author>Fowler, Martin; Beck, Kent</author>
    <title>Refactoring: Improving the design of existing code</title>
    <genre>Scientific</genre>
    <price>60.00</price>
  </book>
</catalog>

In this example, we have an XML file representing a catalog of books, where the catalog is the root element and holds all the information that we will handle.

Each book’s author, title, genre, and price are represented by nested elements inside the parent tag book. This structure uses an attribute to define each book index.

The lower-level elements, like author, have their values represented by text, a string placed between its starting and closing tags.

XPath Language

XPath’s first version became a W3C (World Wide Web Consortium) recommendation in 1999. Since then, the XPath name has been used to refer to XML Path Language.

XPath uses non-XML syntax path expressions to navigate through an XML document. These expressions are similar to those used to navigate through the folders in an operating system. This characteristic makes the XPath  usage very familiar to those who are starting work with it:

Example expressionResult
/catalog/bookMatches
all book elements child of catalog
/catalogMatches the root element catalog
/catalog/book[1]Matches the first book element inside catalog
/catalog/book[price<20.00]Matches all the book elements with price lower than 20.00
/catalog/book[price>10.00]/authorMatches the author's name of all books with a price greater than 10.00

In summary, the XPath’s expressions allow us to combine various criteria to select a node or a set of nodes. The snippets inside squared brackets are called predicates.

Project Setup and Configuration

To understand how XPath works in action, let’s create a project to experience different alternatives for navigating through an XML file. 

As it is a simple example project, let’s create a simple console .NET console project.

Creating an XML File in Visual Studio

First, let’s add our XML file to the project by adding a new file called BooksCatalog.xml. Next, let’s use a previous code with book samples and paste it into this file.

After creation, we need to configure the XML file to be copied to the output folder on building the application. To do that, let’s right-click in the BooksCatalog.xml file to get access to its properties and, then, select the option Copy always to allow Visual Studio to replace the XML file in the output folder for every build. This configuration is important to ensure that the application will always handle the most recent catalog data.

XPath Selection Methods

XPath offers two different methods for selecting XML nodes.

First, let’s talk about the SelectSingleNode() method. It returns only one XmlNode, the first which matches the search criteria. Other elements that match the query will be ignored. With that in mind, we must analyze if this method meets the requirements.

The second alternative for selecting nodes is the SelectNodes() method, which returns an XmlNodeList with all the elements that match the selection query.

Before we start to read specific data from the file, let’s see what we need to do to load the file into memory:

var doc = new XmlDocument();
doc.Load("BooksCatalog.xml");
var root = doc.DocumentElement;

We create an instance of the XmlDocument class to represent the data in memory. Next, we pass the filename as an argument to the Load() method, which will load the specified document.

Furthermore, we access the base element through the DocumentElement property and set a new variable, root.

We are now ready to perform our queries on our data. So, let’s create a method to perform this action:

public static string SelectSingleBook(XmlNode root)
{
    var node = root.SelectSingleNode("//catalog/book[position()=2]");
    
    return FormatXml(node!.OuterXml);
}

The SelectSingleBook() method receives the root element as a parameter and queries the book at the second position in the catalog. However, the OuterXml property, which holds the entire information inside the selected element, uses an inline representation of the data.

To make the text prettier, as we see in the example file, we must create a formatter method:

public static string FormatXml(string unformattedXml)
{
    return XElement.Parse(unformattedXml).ToString();
}

The string returned from the FormatXml() method will, subsequently, be output in the console with all the element information:

Selected book:
<book id="2">
  <author>Assis, Machado De</author>
  <title>Dom Casmurro</title>
  <genre>Romance</genre>
  <price>50.00</price>
</book>

Following this, let’s create another method to select a group of items:

public static List<string> SelectBooks(XmlNode root)
{
    var nodes = root.SelectNodes("//catalog/book[price<50.00]");

    return nodes!
        .Cast<XmlNode>()
        .Select(x => FormatXml(x.OuterXml))
        .ToList();
}

In the same way, the SelectBooks() method takes the root element as a parameter. But, at this time, we are querying for all elements with price less than 50.00.

Once we get the query result (an XmlNodeList object), we convert it to a string list containing the formatted OuterXml for each element.

Finally, the result is returned and printed in the console:

Selected books:
<book id="1">
  <author>King, Stephen</author>
  <title>IT</title>
  <genre>Horror</genre>
  <price>40.00</price>
</book>
<book id="3">
  <author>Calaprice, Alice; Lipscombe, Trevor</author>
  <title>Albert Einstein: A Biography</title>
  <genre>Biography</genre>
  <price>30.00</price>
</book>

XPath Expressions Containing Namespaces

In another scenario, we face XML models that contain namespaces. The idea behind the namespaces is to enable applications to handle or validate elements differently, even if they have the same name.

Fortunately, the XPath language also supports namespaces in the string path. As we noted, the last book in the catalog has one more attribute to indicate a namespace:

<book id="4" xmlns="urn:example-schema">

Now, let’s create our selection method to query the book containing the namespace:

public static List<string> SelectBooksUsingNamespaces(XmlDocument doc)
{
    var nsmgr = new XmlNamespaceManager(doc.NameTable);
    nsmgr.AddNamespace("ex", "urn:example-schema");

    var nodes = doc.SelectNodes("descendant::ex:book", nsmgr);

    return nodes!
        .Cast<XmlNode>()
        .Select(x => FormatXml(x.OuterXml))
        .ToList();
}

As we can see, the SelectBooksUsingNamespaces() method takes an XmlDocument as a parameter. Following, in the initial part of the function, we create an instance of XmlNamespaceManager using the data provided from the argument variable.

Next, the AddNamespace() method creates an association with the expected namespace. Then, we execute the SelectNodes() method, but, now, using the nsmgr variable in addition to the query expression.

Finally, we convert the result before return. So, the method outcome is printed to the console:

Selected books:
<book id="4" xmlns="urn:example-schema">
  <author>Fowler, Martin; Beck, Kent</author>
  <title>Refactoring: Improving the design of existing code</title>
  <genre>Scientific</genre>
  <price>60.00</price>
</book>

Conclusion

In conclusion, XPath provides easy manners to navigate through an XML document. Also, the path expression syntax makes the queries very intuitive. Furthermore, the SelectSingleNode() and SelectNodes() selection methods are flexible enough to work from simple requests to more complex ones.

Liked it? Take a second to support Code Maze on Patreon and get the ad free reading experience!
Become a patron at Patreon!