How to Parse HTML in C#?

How to Parse HTML in C#?

To parse HTML in C# in tasks like web scraping, extracting data, or working with website content, you need a proper structure like XML for HTML. You can use tools like HtmlAgilityPack, AngleSharp, and CsQuery to do this. In this blog, we will discuss various methods to parse HTML in C# using both built-in .NET libraries and third-party options.

Table of Contents:

Why Parse HTML in C#?

It is important to know why you’d want to parse HTML in C# before knowing the methods:

  • Web Scraping: You can grab the data from web pages to analyze, automate, or build an index in the task.
  • Content Modification: You can edit the HTML code directly. 
  • Web Automation: You can work with the web elements and forms.
  • Data Extraction: You can organize the information taken from the web page, like tables or lists. 

Methods to Parse HTML In C#  

There are many ways to parse HTML in C#. You can use the HtmlAgilityPack, AngleSharp, and CsQuery for this purpose. 

Easy-to-Follow Courses Designed for Absolute Beginners
Top Web Development Courses to Kickstart Your Coding Journey
quiz-icon

Method 1: Using the HtmlAgilityPack (HAP) 

You can parse and work with HTML in C# using the HtmlAgilityPack, which is the most used library for this purpose. You can get a similar structure to XML‘s DOM, which can navigate and manipulate HTML. This can allow you to use XPath queries to find specific elements or data. 

Installation:

You can install the HtmlAgilityPack via NuGet:

Install-Package HtmlAgilityPack

Example:

using HtmlAgilityPack;
using System;

class Program
{
static void Main()
{
string url = "https://example.com";
var web = new HtmlWeb();
var doc = web.Load(url);

// Extracting all links
foreach (var link in doc.DocumentNode.SelectNodes("//a[@href]"))
{
Console.WriteLine(link.Attributes["href"].Value);
}
}
}

Advantages: 

  • You can handle malformed HTML well.
  • You get the support from the XPath. 
  • It is easy to use and well-documented.

Method 2: Using the AngleSharp 

You can parse, query, and edit the HTML documents by AngleSharp. This depends on the HTML because it behaves like a web browser. It can be used for tasks that need accurate control over the web content.  

Installation:

Install-Package AngleSharp

Example:

using AngleSharp;
using System;
using System.Threading.Tasks;

class Program
{
static async Task Main()
{
var config = Configuration.Default.WithDefaultLoader();
var context = BrowsingContext.New(config);
var document = await context.OpenAsync("https://example.com");

foreach (var link in document.QuerySelectorAll("a"))
{
Console.WriteLine(link.GetAttribute("href"));
}
}
}

Advantages:

  • It supports modern HTML5 parsing. 
  • This stimulates the browser version; it is better than HAP.
  • It provides you with CSS selector-based queries.

Method 3: Using CsQuery

CsQuery is a library that works like jQuery. It is easy to work with this to parse HTML in C#. 

You can use this method to query or to manipulate the HTML like jQuery. 

Installation:

Install-Package CsQuery

Example:

using CsQuery;
using System;

class Program
{
static void Main()
{
string html = "<html><body><p>Hello World!</p></body></html>";
CQ dom = CQ.Create(html);
foreach (var p in dom["p"])
{
Console.WriteLine(p.InnerText);
}
}
}

Advantages: 

  • It has queries.
  • It is used for manipulation.
  • It is suitable for HTML operations.

Get 100% Hike!

Master Most in Demand Skills Now!

Conclusion 

You can parse HTML in C# for tasks like scraping, getting data, and content modification. You can use the HtmlAgilityPack for its DOM-like structure and XPath support, AngleSharp for CSS selector-based queries, and CsQuery. The methods can be chosen based on the requirements.  

How to Parse HTML in C#? – FAQs      

Q1. What are the famous libraries for parsing HTML in C#?

The HtmlAgilityPack, AngleSharp, and CsQuery are popular for parsing HTML in C#.

Q2. Why should I use HtmlAgilityPack?

You can use the HtmlAgilityPack for providing a DOM-like structure and supporting XPath queries, which make it easy for navigation and manipulation.

Q3. When should I use AngleSharp over HtmlAgilityPack?

Choose AngleSharp when you need modern HTML5 support and CSS selector-based queries.

Q4. Is CsQuery still a good option for HTML parsing?

Yes, you can use CsQuery because it is compatible with jQuery. It helps for querying and manipulating HTML.

Q5. Which method is best for beginners?

HtmlAgilityPack is often the most beginner-friendly option due to its ease of use and documentation.

About the Author

Technical Research Analyst - Full Stack Development

Kislay is a Technical Research Analyst and Full Stack Developer with expertise in crafting Mobile applications from inception to deployment. Proficient in Android development, IOS development, HTML, CSS, JavaScript, React, Angular, MySQL, and MongoDB, he’s committed to enhancing user experiences through intuitive websites and advanced mobile applications.

Full Stack Developer Course Banner