LINQ to XML is an in-memory XML programming interface that provides LINQ functionality to programmers. Like the Document Object Model (DOM), we can use LINQ to XML to load XML documents into memory. However, this way we can process them more efficiently, using the advanced features of LINQ.
Let’s dive in.
Using LINQ to XML
Let’s start by creating an XML document that we will use for the first group of examples. It contains information about three students and the courses they have taken:
<Students> <Student ID="111"> <FirstName>John</FirstName> <LastName>Doe</LastName> <DateOfBirth>2000-10-2</DateOfBirth> <Semester>2</Semester> <Major>Computer Science</Major> <Courses> <Course ID="CS103"> <Grade>7.3</Grade> </Course> <Course ID="CS202"> <Grade>6.9</Grade> </Course> </Courses> </Student> <Student ID="222"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> <DateOfBirth>2001-2-22</DateOfBirth> <Semester>1</Semester> <Major>Electrical Engineering</Major> <Courses> <Course ID="EE111"> <Grade>5.6</Grade> </Course> <Course ID="EE303"> <Grade>8.8</Grade> </Course> </Courses> </Student> <Student ID="333"> <FirstName>Jim</FirstName> <LastName>Doe</LastName> <DateOfBirth>2000-3-12</DateOfBirth> <Semester>2</Semester> <Major>Computer Science</Major> <Courses> <Course ID="CS103"> <Grade>7.6</Grade> </Course> <Course ID="CS202"> <Grade>8.2</Grade> </Course> </Courses> </Student> </Students>
LINQ to XML supports two types of syntax forms:
- Query syntax
- Method syntax
Let’s see a simple query that uses the query syntax form:
var studentsXML = XElement.Load("Students.xml"); var students = from student in studentsXML.Elements("Student") select student.Element("FirstName").Value + " " + student.Element("LastName").Value; foreach(var student in students) { Console.WriteLine(student); }
First, we load the XML from a file using the XElement.Load()
method. Then, we select all students and we return the concatenation of their first and last name.
Let’s see the same query in method syntax form:
var studentsXML = XElement.Load("Students.xml"); var students = studentsXML.Elements("Student") .Select(student => student.Element("FirstName").Value + " " + student.Element("LastName").Value); foreach(var student in students) { Console.WriteLine(student); }
Going forward, we’re going to use the method syntax form most of the time. However, we’re going to show both forms in some interesting cases.
LINQ to XML Main Classes
In the introductory examples, we used class XElement
in order to load the XML document into memory.
XElement
is one of the most important classes in LINQ to XML and represents an XML element. Apart from querying an XML tree, we can also use XElement
to create a new XML tree or modify an existing one.
Other important classes are XAttribute
and XDocument
. XAttribute
represents an XML attribute, while XDocument
represents the whole XML document (along with XML declaration, processing instructions, and comments). We may choose to use XDocument
instead of XElement
, only if we need this extra functionality otherwise, XElement
is much simpler to use.
Create XML Tree
First, let’s see how to create a new XML tree using LINQ to XML:
XElement students = new XElement("Students", new XElement("Student", new XAttribute("ID", "111"), new XElement("FirstName", "John"), new XElement("LastName", "Doe"), new XElement("DateOfBirth", "2000-10-2"), new XElement("Semester", "2"), new XElement("Major", "Computer Science"), new XElement("Courses", new XElement("Course", new XAttribute("ID", "CS103"), new XElement("Grade", "7.3") ), new XElement("Course", new XAttribute("ID", "CS202"), new XElement("Grade", "6.9") ) ) ) );
Here, we use nested XElement
(and XAttribute
) objects in order to create the XML tree. We can see that one or more child elements can be passed as parameters to the constructor of the parent element.
Of course, another way to create an XML tree is to use a LINQ to XML query, as we’ve already seen in our examples.
Queries with LINQ to XML
Now, let’s learn how to make more complex queries with LINQ to XML. There are various ways to search for specific elements in an XML tree.
We can filter by:
- Element value
- Attribute value
- Elements and/or attributes in nested elements
Query by Element Value
We can use the value of an XML element to filter an XML tree:
var students = StudentsXML.Elements("Student") .Where(student => ((DateTime)student.Element("DateOfBirth")).Year == 2000) .Select(student => (string)student.Element("FirstName") + " " + (string)student.Element("LastName") + " (" + ((DateTime)student.Element("DateOfBirth")).ToShortDateString() + ")");
We get the students that were born in the year 2000. The returned Enumerable contains strings with the first name, last name, and birth date of the selected students.
Query by Attribute Value
We can also filter an XML tree by using the attribute value of an XML element:
var student = StudentsXML.Elements("Student") .Where(student => (int)student.Attribute("ID") == 222) .FirstOrDefault();
The query returns the student object with an ID equal to 222.
Query in Nested Elements
In this case, we can filter an XML tree based on nested elements with specific characteristics:
var students = StudentsXML.Elements("Student") .Where(student => student.Elements("Courses").Elements("Course") .Any(course => (string)course.Attribute("ID") == "CS103"));
Here, we query all students that have taken the course with an ID equal to CS103. In general, we can use any combination of elements and attributes in our queries in order to obtain the desired results.
Updates with LINQ to XML
We can also use LINQ to XML in order to modify an XML tree in various ways.
More specifically, we can:
- Insert a new element/attribute
- Update an existing element/attribute
- Delete an existing element/attribute
Insert a New Element/Attribute
We can modify an existing XML tree by inserting one or more elements or attributes:
var students = StudentsXML.Elements("Student") .Where(student => (int)student.Element("Semester") == 2 && (string)student.Element("Major") == "Computer Science"); foreach (var student in students) { student.Element("Courses").Add(new XElement("Course", new XAttribute("ID", "CS204"), new XElement("Grade"))); } studentsXML.Save("student_out.xml");
We can see that we have added a new course (CS204) to all students of the 2nd semester. Note that we’ve chosen to leave the Grade element empty. Also, we can combine two conditions with the “AND” (&&) operator in the Where()
clause.
Moreover, we can save the updated XML tree in a new file by using the XElement.Save()
method.
Update an Existing Element/Attribute
By updating one or more elements and/or attributes, we can change an XML tree:
var student = StudentsXML.Elements("Student") .Where(student => (int)student.Attribute("ID") == 333) .FirstOrDefault(); student.Element("FirstName").Value = "Jimmy";
Here, we change the first name of the student with ID equal to 333.
Delete an Existing Element/Attribute
Finally, we can delete one or more elements and attributes:
StudentsXML.Elements("Student") .Where(student => (int)student.Attribute("ID") == 333) .Remove();
Here, we remove the student with an ID equal to 333.
Advanced Queries with LINQ to XML
So far, we have seen simple queries that operate on a single XML tree. However, there are cases where we need to join two or more XML trees in order to get more complex results. Moreover, we can group the results or sort them based on the value of a specific element or attribute.
For the advanced query examples we are going to use the following XML:
<?xml version="1.0" encoding="utf-8" ?> <University> <Students> <Student ID="111"> <FirstName>John</FirstName> <LastName>Doe</LastName> <DateOfBirth>2000-10-2</DateOfBirth> <Semester>1</Semester> <Major>Computer Science</Major> <Courses> <Course ID="CS101"> <Grade>7.3</Grade> </Course> <Course ID="CS102"> <Grade>6.9</Grade> </Course> </Courses> </Student> <Student ID="222"> <FirstName>Jane</FirstName> <LastName>Doe</LastName> <DateOfBirth>2001-2-22</DateOfBirth> <Semester>1</Semester> <Major>Electrical Engineering</Major> <Courses> <Course ID="EE101"> <Grade>5.6</Grade> </Course> <Course ID="EE102"> <Grade>8.8</Grade> </Course> </Courses> </Student> <Student ID="333"> <FirstName>Jim</FirstName> <LastName>Doe</LastName> <DateOfBirth>2000-3-12</DateOfBirth> <Semester>2</Semester> <Major>Computer Science</Major> <Courses> <Course ID="CS102"> <Grade>7.6</Grade> </Course> <Course ID="CS103"> <Grade>8.2</Grade> </Course> </Courses> </Student> </Students> <Courses> <Course ID="CS101"> <Title>Intro to programming</Title> <Credits>5</Credits> </Course> <Course ID="CS102"> <Title>Discrete Mathematics</Title> <Credits>4</Credits> </Course> <Course ID="CS103"> <Title>Data structures</Title> <Credits>6</Credits> </Course> <Course ID="EE101"> <Title>Electric fields</Title> <Credits>5</Credits> </Course> <Course ID="EE102"> <Title>Electronics</Title> <Credits>6</Credits> </Course> </Courses> </University>
We’ve created a root element with the name University
. The root element contains two sequences, one with students and one with courses. The Courses
elements provide information about the title of the course and the credits each course offers.
Join two XML Trees (and Create Anonymous Objects)
Let’s see a query that joins the two sequences:
var students = UniversityXML.Elements("Students").Elements("Student") .Where(student => (string)student.Attribute("ID") == "111") .Select(student => new { Name = (string)student.Element("FirstName") + " " + (string)student.Element("LastName"), Courses = student.Elements("Courses").Elements("Course") .Join(UniversityXML.Elements("Courses").Elements("Course"), studentCourse => (string)studentCourse.Attribute("ID"), course => (string)course.Attribute("ID"), (studentCourse, course) => new { Id = (string)course.Attribute("ID"), Title = (string)course.Element("Title"), Grade = (decimal)studentCourse.Element("Grade"), Credits = (int)course.Element("Credits") }) });
This query first selects the student with an ID equal to 111. Then, it creates a new anonymous object that consists of two properties: Name
and Courses
. The Courses
property is a list that also contains anonymous objects.
Those objects correspond to the courses taken by the student and combine information from the two lists:
Id
,Title
, andCredits
are taken from theClasses
elements and,Grade
comes from theStudent
element.
Let’s see this join operation with the query syntax form:
var students = from student in UniversityXML.Elements("Students").Elements("Student") where (string)student.Attribute("ID") == "111" select new { Name = (string)student.Element("FirstName") + " " + (string)student.Element("LastName"), Courses = ( from studentCourses in student.Elements("Courses").Elements("Course") join course in UniversityXML.Elements("Courses").Elements("Course") on (string)(studentCourses.Attribute("ID")) equals (string)course.Attribute("ID") select new { Id = (string)course.Attribute("ID"), Title = (string)course.Element("Title"), Grade = (decimal)studentCourses.Element("Grade"), Credits = (int)course.Element("Credits") } ) };
Join two XML Trees (and Create Named Class Objects)
Here, we will modify the previous example, so that it produces an object (or a list of objects) based on a Student
class (not anonymous objects):
var students = UniversityXML.Elements("Students").Elements("Student") .Where(student => (string)student.Attribute("ID") == "111") .Select(student => new Student() { Name = (string)student.Element("FirstName") + " " + (string)student.Element("LastName"), Courses = student.Elements("Courses").Elements("Course") .Join(UniversityXML.Elements("Courses").Elements("Course"), studentCourse => (string)studentCourse.Attribute("ID"), course => (string)course.Attribute("ID"), (studentCourse, course) => new Course() { Id = (string)course.Attribute("ID"), Title = (string)course.Element("Title"), Grade = (decimal)studentCourse.Element("Grade"), Credits = (int)course.Element("Credits") }) });
Join two XML Trees (and Create a New XElement)
Instead of creating C# objects (anonymous or not), we can create a new XElement
:
var newElement = new XElement("Students", UniversityXML.Elements("Students").Elements("Student") .Where(student => (string)student.Attribute("ID") == "111") .Select(student => new XElement("Student", new XElement("Name", (string)student.Element("FirstName") + " " + (string)student.Element("LastName")), new XElement("Courses", student.Elements("Courses").Elements("Course") .Join(UniversityXML.Elements("Courses").Elements("Course"), studentCourse => (string)studentCourse.Attribute("ID"), course => (string)course.Attribute("ID"), (studentCourse, course) => new XElement("Course", new XAttribute("Id", (string)course.Attribute("ID")), new XElement("Title", (string)course.Element("Title")), new XElement("Grade", (decimal)studentCourse.Element("Grade")), new XElement("Credits", (int)course.Element("Credits")) ) ) ) )) );
The result of this query is a new XElement
that contains the combined information from the join:
<Students> <Student> <Name>John Doe</Name> <Courses> <Course ID="CS101"> <Title>Intro to programming</Title> <Grade>7.3</Grade> <Credits>5</Credits> </Course> <Course ID="CS102"> <Title>Discrete Mathematics</Title> <Grade>6.9</Grade> <Credits>4</Credits> </Course> </Courses> </Student> </Students>
Aggregate Functions
Let’s see how we can use LINQ’s aggregate functions to compute sums and counts:
var students = UniversityXML.Elements("Students").Elements("Student") .Select(student => new { Name = (string)student.Element("FirstName") + " " + (string)student.Element("LastName"), TotalCredits = student.Elements("Courses").Elements("Course") .Join(UniversityXML.Elements("Courses").Elements("Course"), studentCourse => (string)studentCourse.Attribute("ID"), course => (string)course.Attribute("ID"), (studentCourse, course) => (int)course.Element("Credits") ) .Sum(), CoursesCount = student.Elements("Courses").Elements("Course") .Join(UniversityXML.Elements("Courses").Elements("Course"), studentCourse => (string)studentCourse.Attribute("ID"), course => (string)course.Attribute("ID"), (studentCourse, course) => (string)course.Attribute("ID") ) .Count() });
This query returns the sum of credits each student has earned. It also returns a count of the courses he has followed.
Grouping of Results
By applying the GroupBy()
method to our query, we can group the results based on a specific element or attribute:
var students = new XElement("Semesters", UniversityXML.Elements("Students").Elements("Student") .GroupBy(student => (int)student.Element("Semester")) .Select(group => new XElement("Semester", new XAttribute("ID", (int)group.Key), group.Select(s => new XElement("Student", new XElement("FirstName", (string)s.Element("FirstName")), new XElement("LastName", (string)s.Element("LastName")) ) ) )));
Here, we group by the Semester
element and we get sequences of students according to their semester:
<Semesters> <Semester ID="1"> <Student> <FirstName>John</FirstName> <LastName>Doe</LastName> </Student> <Student> <FirstName>Jane</FirstName> <LastName>Doe</LastName> </Student> </Semester> <Semester ID="2"> <Student> <FirstName>Jim</FirstName> <LastName>Doe</LastName> </Student> </Semester> </Semesters>
Using Namespaces in LINQ to XML
So far, we have dealt with XML documents that did not contain any namespaces. We can create a new XML tree with a default namespace like this:
XNamespace st = "http://www.testuni.edu/def"; XElement students = new XElement(st + "Students", new XElement(st + "Student", new XAttribute("ID", "111"), new XElement(st + "FirstName", "John"), new XElement(st + "LastName", "Doe"), new XElement(st + "DateOfBirth", "2000-10-2"), new XElement(st + "Semester", "2"), new XElement(st + "Major", "Computer Science"), new XElement(st + "Courses", new XElement(st + "Course", new XAttribute("ID", "CS103"), new XElement(st + "Grade", "7.3") ), new XElement(st + "Course", new XAttribute("ID", "CS202"), new XElement(st + "Grade", "6.9") ) ) ) );
After we create a new XNamespace
object, we need to add it to every element in the XML tree to get the result:
<Students xmlns="http://www.testuni.edu/def"> <Student ID="111"> <FirstName>John</FirstName> <LastName>Doe</LastName> <DateOfBirth>2000-10-2</DateOfBirth> <Semester>2</Semester> <Major>Computer Science</Major> <Courses> <Course ID="CS103"> <Grade>7.3</Grade> </Course> <Course ID="CS202"> <Grade>6.9</Grade> </Course> </Courses> </Student> </Students>
We can also provide a namespace prefix, instead of having a default namespace:
XNamespace st = "http://www.testuni.edu/def"; XElement students = new XElement(st + "Students", new XAttribute(XNamespace.Xmlns + "st", "http://www.testuni.edu/def"), new XElement(st + "Student", new XAttribute("ID", "111"), new XElement(st + "FirstName", "John"), new XElement(st + "LastName", "Doe"), new XElement(st + "DateOfBirth", "2000-10-2"), new XElement(st + "Semester", "2"), new XElement(st + "Major", "Computer Science"), new XElement(st + "Courses", new XElement(st + "Course", new XAttribute("ID", "CS103"), new XElement(st + "Grade", "7.3") ), new XElement(st + "Course", new XAttribute("ID", "CS202"), new XElement(st + "Grade", "6.9") ) ) ) );
The difference here is that we add an XAttribute
object (with the prefix ‘st:’) to the Students element:
<st:Students xmlns:st="http://www.testuni.edu/def"> <st:Student ID="111"> <st:FirstName>John</st:FirstName> <st:LastName>Doe</st:LastName> <st:DateOfBirth>2000-10-2</st:DateOfBirth> <st:Semester>2</st:Semester> <st:Major>Computer Science</st:Major> <st:Courses> <st:Course ID="CS103"> <st:Grade>7.3</st:Grade> </st:Course> <st:Course ID="CS202"> <st:Grade>6.9</st:Grade> </st:Course> </st:Courses> </st:Student> </st:Students>
Of course, we can do various things with XML namespaces. For example, we can have more than one namespace with its respective prefixes. Or, we can use the default namespace, in addition to one or more namespace prefixes.
Conclusion
LINQ to XML is a great way to handle XML documents. The topic of LINQ to XML is very large though and we have tried to cover the most important aspects. This includes XML document creation, queries, and modification. We’ve also touched on some advanced cases, where we have more than one XML tree.