Recently I've been reviewing the most efficient way of retrieving values from an XML document in .NET. As Scott Hansleman mentions, the best way to find this out is to write some code and measure the performance, so I wrote some code that can be downloaded to test the performance of three different approaches:
- an XmlDocument using XPath queries with SelectSingleNode
- an XPathDocument with an XPathNavigator
- using the XmlSerializer to deserialize into a custom class
The results show that the XmlSerializer is fastest once the initial cost of creating temporary assemblies has been overcome. In situations where the initial performance is most important then an XPathNavigator over an XPathDocument is the fastest.
Approach
Here's more detail about each approach:
| Object Type |
XmlDocument |
XPathDocument |
XmlSerializer |
| Retrieval Method |
XPath query using XmlDocument.SelectSingleNode |
XPath queries using an XPathDocument |
Object properties |
| Advantages |
Familiar to many developers. XPath queries allow for quick evaluation of complex expressions. |
Optimized for XPath and XSLT transformations. XPath queries allow for quick evaluation of complex expressions. Likely to become more important in future. |
Turns XML into Objects |
| Disadvantages |
Slow, requires the whole document to be in memory. |
Slightly more complex for developers to write than XmlDocument. |
Requires familiarity with XSD and is more complex to set up. Can't match XPath's complex expressions. Slow due to generation of dynamic assemblies on first use. |
| Example |
XmlDocument doc = new XmlDocument(); doc.Load(filePath); XmlNodeList selection = doc.SelectNodes(XPath); result = selection.Item(0).InnerText; |
XPathDocument doc = new XPathDocument(filePath); XPathNavigator nav = doc.CreateNavigator(); XPathNodeIterator it = nav.Select(XPath); it.MoveNext(); result = it.Current.Value; |
XmlTextReader reader = new XmlTextReader(filePath); XmlSerializer ser = new XmlSerializer(typeof(message)); message mymsg = (message)ser.Deserialize(reader); result = myMsg.MessageID; |
Aaron Skonnard provides some excellent background on the benefits of the XPathDocument over the XmldDocument in his article .NET XML Best Practices: Part I: Choosing an XML API.
For the timing tests I used the code available in the EggHead Cafe article "High-Precision Code Timing in .NET". Unfortunately the Ticks property of the System.DateTime class is only accurate to around 16ms even though it displays values to 100 nanoseconds.
Results
In this app I take a small XML document with no namespace and retrieve four values from it. Here are the figures I got when running the console application for the first run and then a repeat run:
| Runs |
XmlDocument |
XPathDocument/Navigator |
XML Serialization |
| 1 |
0.00543 |
0.00129 |
0.09020 |
| 1 |
0.00051 |
0.00035 |
0.00028 |
The first time the application is run the XML Serializer has to create temporary dynamic assemblies so it performs the worst, however on subsequent runs within the application it performs the fastest since it can use a cached copy of the temporary assemblies (As Scott found out, these assemblies are not cached in .NET 1.0 if you specify a namespace). Daniel Cazzulino has some good background on how the XmlSerializer works). In both the first and the second runs the XPathDocument/XPathNavigator approach is faster than the XmlDocument.
In the sample code I've also presented an ASP.NET web application that can host the same tests. ASP.NET creates a new AppDomain for each website it hosts and is maintained across requests to the server until it is shut down or recycled. Since the XmlSerializer caches its temporary assemblies based on the AppDomain, using the XmlSerializer in an ASP.NET web application or web service application is actually the fastest technique for retrieving values from an XML document.
In situations where the AppDomain is created per-request then using the XPathDocument/XPathNavigator will be more efficient than an XmlDocument and the XmlSerializier.
Discussion
The XmlSerializer is the fastest way to retrieve values from a small XML file if it is possible to overcome the cost of the creation of the temporary assemblies. In situations where the first retrieval is the most important, or where more complex XPath queries are used the using an XPathNavigator over an XPathDocument provides better performance than the XmlDocument.
Thankfully it will be possible with .NET 2.0/Whidbey to use a tool (sgen.exe) to pre-create and compile Serializers. Doug Purdy covered this in his PDC talk, see Scott's notes. I believe this will make XmlSerializer the fastest approach to retrieving values from XML, but testing will tell.
Click here to download the sample code.