In this article, we’ll talk about how to read XML Documents in C#. In the preceding article, we addressed the creation of custom XML documents.
Also, we have already explored how to serialize and deserialize objects to and from XML in the articles titled Serializing Objects to XML in C# and XML Deserialization in C#. So what else can we learn?
Reading XML Documents
NOTE: The code will employ the classes Person
, People
, and CreateXMLUsingXmlWriter
, developed in the preceding article, and we won’t duplicate the code here.
Similar to creating XML documents, where we could choose between LinqToXml
or XmlWriter
, we once again have two potential approaches: utilizing the XDocument
class or opting for the older XmlReader
class.
The XDocument
path is simpler and more user-friendly, while XmlReader
provides greater control over the reading process.
Using XDocument to Read XML Documents in C#
Reading XML using XDocument
couldn’t be simpler. Whether we are reading XML from a file or a string, the XDocument
class provides functions for these scenarios.
To read an XML document from a string, we use the Parse()
method. To read an XML from a file, we use the Load()
method.
If the document is not syntactically correct, these methods will throw an exception:
public class ReadingXmlUsingXDocument { public XDocument ReadXmlAndCatchErrors(string xml) { try { return XDocument.Parse(xml); } catch (Exception ex) { Console.WriteLine("Something went wrong:"); Console.WriteLine(ex); } return new XDocument(); } }
We’ve created a ReadingXmlUsingXDocument
class utilizing the XDocument
class. This class features a method, ReadXmlAndCatchErrors()
, which parses the given XML string. If any errors occur during parsing, the method catches the error, displays it in the console, and returns an empty XDocument.
To evaluate this method, we can generate XML documents using the previously developed CreateXMLUsingXmlWriter
class. First, we create a valid XML document and then proceed to read it:
public static string TestValidXml() { var xmlDoc = ReadXmlAndCatchErrors(CreateXMLUsingXmlWriter.CreateSimpleXML(People.GetOne())); return xmlDoc.ToString(); }
Everything unfolds as anticipated. The document is successfully created, and we can both read its content and output it to the console.
But what happens if we provide invalid XML:
public static string TestInvalidXml() { var xmlDoc = ReadXmlAndCatchErrors(CreateXMLUsingXmlWriter.CreateWrongXML(People.GetOne())); return xmlDoc.ToString(); }
XDocument.Parse()
will throw an exception and our ReadXmlAndCatchErrors()
method catches it and returns an empty XDocument.
Getting Values From XDocument
With the XML document now within the XDocument
object, the question arises: How do we extract the name or age of a person from the XDocument object?
As this extends beyond the scope of this article, we will briefly outline a few options using the initially created ‘people document’:
<person> <name> <firstName>Emma</firstName> <lastName>Brown</lastName> </name> <email>[email protected]</email> <age>58</age> </person>
XML document contains data about a person. Among them are the email and the age of the person we want to extract.
Reading With the Use of the Element Collection
Every element inside XDocument
has an element collection of all its children. If an element has no children, this collection will be empty.
We traverse through the XML tree by referencing the elements by name and then read their values:
public static string TestReadWithElementCollection() { var xmlDoc = ReadXmlAndCatchErrors(CreateXMLUsingXmlWriter.CreateSimpleXML(People.GetOne())); var name = xmlDoc.Root!.Element("name")!.Element("firstName")!.Value; var age = xmlDoc.Root!.Element("age")!.Value; return $"Name: {name}, Age: {age}"; }
We start from the root of the document, which is a <person>
element. There we search for the <name>
element, and inside the <name>
element, we find the <firstName>
element. From the <firstName>
element, we take the value, which is, of course, the name of the person.
The same is true for the age.
Notice that here we are using the ‘!’ operator, as we know in our example case, that the XML document contains all the elements we are searching for.
Using XPath
To retrieve data from an XML document, we can also utilize XPath
. The XPath
language is quite extensive, but in this article, we will provide a simple example of reading the name and age fields:
public static string TestReadUsingXPath() { var xmlDoc = ReadXmlAndCatchErrors(CreateXMLUsingXmlWriter.CreateSimpleXML(People.GetOne())); var name = xmlDoc.XPathSelectElement("/person/name/firstName")!.Value; var age = xmlDoc.XPathSelectElement("/person/age")!.Value; return $"Name: {name}, Age: {age}"; }
In this example, we use XPath
to locate elements like navigating a file system. For instance, to access the <firstName>
element, we use the path “/person/name/firstName
.”
There are various ways to express XPath
queries, such as:
-
//firstName
-
/person/name/*[1]
-
/person/name/*[local-name()='firstName']
The first XPath
expression selects all <firstName>
elements from anywhere in the XML document. The second selects the first child element under the <name>
element within the <person>
element. The last one selects the first child element under the <name>
element within the <person>
element, specifically if its local name is <firstName>
, disregarding the namespace.
XPath is an extensive language, and the examples given here offer only a brief overview. For a more in-depth exploration of this topic, check out our in-depth article: Selecting Xml Nodes With XPath.
Using XmlReader to Read XML Documents in C#
The second option mentioned for reading XML is the XmlReader
class.
The XmlReader
class parses the XML one element at a time. However, here we are not talking only about elements such as <name>
, but also elements such as Attribute
, Whitespace
, Text
, CDATA
, and others.
All these possibilities are described in an XmlNodeType
enum. We use a reader to read one element after another sequentially, and the reader will give us the next element and its type, value, etc:
public static IEnumerable<string> ReadXml(string xml) { using var reader = XmlReader.Create(new StringReader(xml)); List<string> result = []; while (reader.Read()) { result.Add($"> {reader.NodeType} | {reader.Name} | {reader.Value}"); } return result; }
Running this method will produce a long list of different types, but Whitespace
type may ruin the format as every new line is also a Whitespace
.
Let’s remove the whitespaces and see what we get:
public static IEnumerable<string> ReadXmlWithoutWhiteSpace(string xml) { using var reader = XmlReader.Create(new StringReader(xml)); List<string> result = []; while (reader.Read()) { if (reader.NodeType == XmlNodeType.Whitespace) continue; result.Add($"> {reader.NodeType} | {reader.Name} | {reader.Value}"); } return result; }
This method is very similar, except it skips all whitespaces. From our sample XML document, we will get output like this:
> XmlDeclaration | xml | version="1.0" encoding="utf-16" > Element | person | > Element | name | > Element | firstName | > Text | | William > EndElement | firstName | > Element | lastName | > Text | | Taylor > EndElement | lastName | > EndElement | name | > Element | email | > Text | | [email protected] > EndElement | email | > Element | age | > Text | | 20 > EndElement | age | > EndElement | person |
We can learn a lot by examining the output of the method.
If nothing more, at least even the basic elements like <lastName>Taylor</lastName>
is split into three parts: Element
, Text
, EndElement
.
How to Read XML Documents in Practice
Now that we’ve understood the intricacies of reading custom XML documents let’s review a practical example.
We’ll create a method that enables us to traverse through a lengthy XML file filled with names and surnames. The objective is to extract both the first and last names efficiently.
Reading Names and Surnames
We’ve already set up an XML file containing personal data with <firstName>
and <lastName>
elements. We aim to extract this information and display it in a tabular format.
Firstly, we’ll define a private class, PersonData
, with FirstName
and LastName
properties to store individual names:
private class PersonData { public string? FirstName { get; set; } public string? LastName { get; set; } public void Init() => FirstName = LastName = ""; }
Now, the implementation is straightforward:
public static void ReadNamesAndAges(string xml) { var settings = new XmlReaderSettings { IgnoreWhitespace = true }; using var reader = XmlReader.Create(new StringReader(xml), settings); { var personData = new PersonData(); var numberOfPersons = 0; while (reader.Read()) { if (reader.NodeType == XmlNodeType.Element) { if (reader.Name == "person") personData.Init(); if (reader.Name == "firstName") personData.FirstName = reader.ReadElementContentAsString(); if (reader.Name == "lastName") personData.LastName = reader.ReadElementContentAsString(); } if (reader.NodeType == XmlNodeType.EndElement) { if (reader.Name == "person") Console.WriteLine($"#{++numberOfPersons,3} | " + $"{personData.FirstName,-15} | {personData.LastName,-15}"); } } } }
Initially, we set up the XmlReader
to disregard Whitespaces, simplifying the process. Here, we employ yet another option to skip whitespace by utilizing the XmlReaderSettings class.
As we iterate through elements one by one, when we come across the <person>
element, we initialize the PersonData
object. For <firstName>
or <lastName>
elements, we update the respective properties.
Upon encountering the </person>
end element, we know that we have all the data for a person, allowing us to print it out. This leads to a well-organized table of people from our XML file:
# 1 | Olivia | Davis # 2 | John | Davis # 3 | Sarah | Moore # 4 | Sarah | Smith
The outcome is a well-organized table displaying the sequence number of a person along with their first and last names.
Conclusion
Despite JSON becoming the de facto standard for API communication, XML remains integral to many industries due to its longstanding presence. As we inevitably encounter XML standards, we must familiarize ourselves with XML handling in .NET, utilizing classes such as XDocument
, XmlDocument
, and XmlReader
.
XDocument
is a versatile tool for most tasks, simplifying the reading of diverse XML documents. For complex or poorly structured XML documents, the advanced capabilities of XmlReader
are essential.
Regardless of your content or its complexity, mastering these tools is key to effective XML handling in .NET.