Parsing (X)HTML into a DOM tree in C#

As a quick alternative to the Html Agility Pack, HTML can be parsed into a DOM tree using the built-in XmlDocument class. The following snippets illustrate HTML parsing in C#.

The main program

using System;
using System.Xml;
using System.Text;
using System.IO;

namespace HtmlDomTree
{
	class MainClass
	{
		public static void Main (string[] args)
		{
			string htmlString = @"<html>
									<head>
										<title>HTML sample page</title>
									</head>
									<body>Hi</body>
								</html>";

			XmlDocument htmlDocument = new XmlDocument();
			htmlDocument.Load(new StringReader(htmlString));

			Console.WriteLine(htmlDocument.InnerXml);

			// Select the body tag
			var bodyNode = htmlDocument.GetElementsByTagName("body").Item(0);

			// Modify the contents of the body
			bodyNode.InnerText = "Hello";

			Console.WriteLine(htmlDocument.InnerXml);
			Console.ReadLine();
		}
	}
}

Output

<html><head><title>HTML sample page</title></head><body>Hi</body></html>
<html><head><title>HTML sample page</title></head><body>Hello</body></html>

You may download the solution file from here.

This entry was posted in C# and tagged , , , , , , , , , , . Bookmark the permalink. Trackbacks are closed, but you can post a comment.

Post a Comment

Your email is never published nor shared. Required fields are marked *

*
*

You may use these HTML tags and attributes: <a href="" title=""> <abbr title=""> <acronym title=""> <b> <blockquote cite=""> <cite> <code> <del datetime=""> <em> <i> <q cite=""> <strike> <strong>

Why ask?