XML Serialization with C#

Today JSON has taken over the role of XML for many scenarios, however, XML is still in very wide usage.

XML stands for eXtensible Markup Language and has been a stalwart for data-interchange since the year 1998. It has been adapted over time to solve many different problems related to programming.

In this article, I briefly compare the JSON and XML data formats. I then discuss how to serialize and deserialize objects to and from XML using C#.

Note that for the sake of consistency with the .NET Framework library code I am using the American spelling for ‘serialize’¬† and other related words ūüėä

JSON vs XML

It’s only really been since the advent of .NET Core (see the System.Text.Json namespace) that Microsoft has baked in proper support for JSON serialization.

Sure we had the JavaScriptSerializer class in the System.Web.Script.Serialization namespace (.NET Framework v3.5+), but it had limitations and wasn’t nearly as performant as other popular third-party libraries such as JSON.NET.

XML, on the other hand, has always been an important part of the .NET Framework class libraries and extensive support is offered within the System.Xml.Serialization namespace.

XmlSerializer is the key Framework class which is used to serialize and deserialize XML.

One of the advantages of JSON over XML has been in terms of brevity over verbosity. XML is sometimes perceived as being overly prescriptive, with requirements for a declaration tag, root tag and opening and closing tags for every element.

Below is a simple example of the first few lines of an XML file containing a representation of a serialized ‘Todo’ object.

<?xml version="1.0" encoding="utf-8"?>
<Todo>
  <Id>1</Id>
  <UserId>27</UserId>
  ...

The equivalent contents of the above XML file can be expressed much more succinctly by JSON.

{
  "Id": 1,
  "UserId": 27,
  ...

Simply put, it takes less text to capture the same amount of data.

XML does have its advantages though and has rich support for things like XPath, XSL and XSD. In many cases, features such as these are either not available or are not as mature for JSON at this point in time.

In addition to JSON and XML, there are of course other formats for encoding data, such as YAML (YAML Ain’t Markup Language). However, these formats haven‚Äôt gained the same popularity as JSON and XML at the time of writing.

I believe it’s safe to say that XML will be with us for a long time to come, despite the increasing prevalence of JSON.

Now, let’s get stuck into some code and see how we can serialize and deserialize XML data with C#.

Serialization

My favourite way to perform XML serialization with C# is via a custom extension method.

I can reuse the extension method across projects to create an XML string from any object. I can then do whatever I please with the returned string; save it to a file, save it to a database or include it in the body of an API request.

var todo = new Todo { Id = 1, Title = "Buy milk", UserId = 1 };
 
string xml = todo.ToXmlString();
 
File.WriteAllText("todo.xml", xml);

The ToXmlString method is convenient to use and provides options for overriding the default serialization behaviour (see the method definition further below).

The Todo class is a very simple model. The output of the above code looks as follows.

<Todo>
  <Id>1</Id>
  <UserId>1</UserId>
  <Title>Buy milk</Title>
  <Completed>false</Completed>
</Todo>

The code to implement the extension method is shown below.

/// <summary>
/// Converts an object to its serialized XML format.
/// </summary>
/// <typeparam name="T">The type of object we are operating on</typeparam>
/// <param name="value">The object we are operating on</param>
/// <param name="removeDefaultXmlNamespaces">Whether or not to remove the default XML namespaces from the output</param>
/// <param name="omitXmlDeclaration">Whether or not to omit the XML declaration from the output</param>
/// <param name="encoding">The character encoding to use</param>
/// <returns>The XML string representation of the object</returns>
public static string ToXmlString<T>(this T value, bool removeDefaultXmlNamespaces = true, bool omitXmlDeclaration = true, Encoding encoding = null) where T : class
{
    XmlSerializerNamespaces namespaces = removeDefaultXmlNamespaces ? new XmlSerializerNamespaces(new[] { XmlQualifiedName.Empty }) : null;
 
    var settings                = new XmlWriterSettings();
    settings.Indent             = true;
    settings.OmitXmlDeclaration = omitXmlDeclaration;
    settings.CheckCharacters    = false;
 
    using (var stream = new StringWriterWithEncoding(encoding))
    using (var writer = XmlWriter.Create(stream, settings))
    {
        var serializer = new XmlSerializer(value.GetType());
        serializer.Serialize(writer, value, namespaces);
        return stream.ToString();
    }
}

One of the slightly annoying things about the built-in serialization is that it adds superfluous XML namespaces to the output, as per the example below.

<Todo xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance" xmlns:xsd="http://www.w3.org/2001/XMLSchema">

In my opinion, this is both ugly and unnecessary (in the majority of scenarios).

To avoid this, by default we set the namespaces to new[] { XmlQualifiedName.Empty }.

The other problem encountered when serializing is that the default character encoding of the default StringWriter class is UTF-16. To resolve this we can sub-class the StringWriter class and pass a specific character encoding via the constructor to override the default encoding.

The code to implement this is shown below. The StringWriterWithEncoding class is used in the previous code sample when creating the output stream.

/// <summary>
/// Overrides the base 'StringWriter' class to accept a different character encoding type.
/// </summary>
public class StringWriterWithEncoding : StringWriter
{
    #region Properties
 
    /// <summary>
    /// Overrides the default encoding type (UTF-16).
    /// </summary>
    public override Encoding Encoding => _encoding ?? base.Encoding;
 
    #endregion
 
    #region Readonlys
 
    private readonly Encoding _encoding;
 
    #endregion
 
    #region Constructor
 
    /// <summary>
    /// Default constructor.
    /// </summary>
    public StringWriterWithEncoding() { }
 
    /// <summary>
    /// Constructor which accepts a character encoding type.
    /// </summary>
    /// <param name="encoding">The character encoding type</param>
    public StringWriterWithEncoding(Encoding encoding)
    {
        _encoding = encoding;
    }
 
    #endregion
}

The ToXmlString extension method encapsulates all of the above default behaviours nicely, without us having to remember to include the same settings and overrides in our code every time we want to serialize something.

Deserialization

When it comes to deserialization, we can also use an extension method, as per the sample code below.

string xml = File.ReadAllText("todo.xml");
 
var todo = new Todo().FromXmlString(xml);

However, this isn’t the cleanest solution.

Since extension methods require an object instance to operate on, we have to new up an object before we can call the method.

As you can imagine, depending on the nature of the object you are newing up this isn’t going to be the most efficient option.

Regardless, if you wanted to implement an extension method for deserialization it would look something like the code below.

/// <summary>
/// Creates an object instance from the specified XML string.
/// </summary>
/// <typeparam name="T">The Type of the object we are operating on</typeparam>
/// <param name="value">The object we are operating on</param>
/// <param name="xml">The XML string to deserialize from</param>
/// <returns>An object instance</returns>
public static T FromXmlString<T>(this T value, string xml) where T : class
{
    using (var reader = new StringReader(xml))
    {
        var serializer = new XmlSerializer(typeof(T));
 
        return (T)serializer.Deserialize(reader);
    }
}

The problem is that the Deserialize method of the XmlSerializer included in the .NET Framework requires an object variable to assign the deserialized object to. This means that we need to new up an empty instance of the object we want to deserialize and then assign the actual deserialized object to this empty object reference to make the extension method approach work.

Due to the above problem, for deserialization, I prefer to use a static method, as per the code below.

var todo = XmlHelper.DeserializeFromString<Todo>(xml);

The signature of the static method would look like this.

public static T DeserializeFromString<T>(string xml) where T : class

As with many things related to programming, however, it very much comes down to personal preference.

Files

If you only ever want to read and write XML files, I definitely recommend implementing static methods to do the work.

The code below demonstrates how to implement a static method to serialize an object to an XML file.

/// <summary>
/// Serializes an object of the specified Type to a file.
/// </summary>
/// <typeparam name="T">The Type of the object to serialize</typeparam>
/// <param name="xmlFilePath">The path to save the XML file to</param>
/// <param name="objectToSerialize">The object instance to serialize</param>
public void SerializeToFile<T>(string xmlFilePath, T objectToSerialize) where T : class
{
    using (var writer = new StreamWriter(xmlFilePath))
    {
        // Do this to avoid the serializer inserting default XML namespaces.
        var namespaces = new XmlSerializerNamespaces();
        namespaces.Add(string.Empty, string.Empty);
 
        var serializer = new XmlSerializer(objectToSerialize.GetType());
        serializer.Serialize(writer, objectToSerialize, namespaces);
    }
}

And here is the code for deserializing an object from an XML file.

/// <summary>
/// Deserializes an XML file to the specified type of object.
/// </summary>
/// <typeparam name="T">The Type of object to deserialize</typeparam>
/// <param name="xmlFilePath">The path to the XML file to deserialize</param>
/// <returns>An object instance</returns>
public T DeserializeFromFile<T>(string xmlFilePath) where T : class
{
    using (var reader = XmlReader.Create(xmlFilePath))
    {
        var serializer = new XmlSerializer(typeof(T));
 
        return (T)serializer.Deserialize(reader);
    }
}

An instance of the XmlReader class is used when deserializing to read the contents of the XML file. Other than that the code is very similar to deserializing from an XML string.

Closing tag

It is quite straightforward to work with XML using C# and the .NET Framework.

Creating wrapper code around the built-in XML functionality can help to simplify your code and decouple it from the underlying Framework methods and types.

Feel free to use the code in this article and inject your own default behaviours and improvements as you see fit. By having your own set of custom methods you have the freedom to swap out the serialization and deserialization logic to call a third-party library in the future if the need arises.

Please take a look at the accompanying GitHub repository for the full code.

Until next time, take care!

Comments

This site uses Akismet to reduce spam. Learn how your comment data is processed.