You can use the replace() function with a regular expression to remove HTML tags from a string using JavaScript.
When working with web applications, you might need plain text to be extracted from HTML strings by removing all tags from the script. There are multiple methods for removing those tags, depending on the input complexity and requirements. We will explore these methods in this blog.
Table of Contents:
To remove HTML tags from a string using JavaScript,you can use the replace() function, DOMParser, innerText, or textContent with document.createElement. Let’s discuss these methods below.
Method 1: Using replace() Function
You can use this function to quickly remove HTML tags from a string. This method searches for the enclosed angle brackets (<>) and removes them. It is only applicable for simple cases and not for malformed HTML or nested HTML. Since regex is not a full-fledged HTML parser, it cannot decode HTML entities like & to &.
Example:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Remove HTML Tags</title>
<script>
function removeHTMLTags(str) {
return str.replace(/<[^>]*>/g, '');
}
function processText() {
const input = document.getElementById("htmlInput").value;
document.getElementById("output").innerText = removeHTMLTags(input);
}
</script>
</head>
<body>
<h2>Remove HTML Tags Example</h2>
<textarea id="htmlInput" rows="4" cols="50"><p>Intelli<strong>paat</strong>!</p></textarea>
<br>
<button onclick="processText()">Remove Tags</button>
<h3>Output:</h3>
<p id="output"></p>
</body>
</html>
Output:
Explanation: The code removes the tag and displays the plain content in the output section when you click the “Remove Tags” button. The HTML tags are replaced by the empty string using a regular expression.
Method 2: Using the DOMParser API
This method converts the string into a temporary HTML document. You can use this method to correctly parse all tags and remove them while retaining text content. It also handles malformed HTML.
Example:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Remove HTML Tags</title>
<script>
function removeHTMLTagsUsingDOMParser(str) {
const parser = new DOMParser();
const doc = parser.parseFromString(str, 'text/html');
return doc.body.textContent || "";
}
function processText() {
const input = document.getElementById("htmlInput").value;
document.getElementById("output").innerText = removeHTMLTagsUsingDOMParser(input);
}
</script>
</head>
<body>
<h2>Remove HTML Tags Example</h2>
<textarea id="htmlInput" rows="4" cols="50"><p>Intelli<strong>paat</strong>!</p></textarea>
<br>
<button onclick="processText()">Remove Tags</button>
<h3>Output:</h3>
<p id="output"></p>
</body>
</html>
Output:
Explanation: The code uses the DOMParser to convert the HTML strings to plain text and display the output section when you click on the Remove Tags
Method 3: Using innerText or textContent with document.createElement
You can remove HTML tags by using the innerText or textContent properties in the temporary div element. You can set the inner HTML of the temporary element and then get the text content. This method is suitable for all different browsers.
Example:
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Remove HTML Tags</title>
<script>
function removeHTMLTagsUsingElement(str) {
const tempElement = document.createElement("div");
tempElement.innerHTML = str;
return tempElement.textContent || tempElement.innerText || "";
}
function processText() {
const input = document.getElementById("htmlInput").value;
document.getElementById("output").innerText = removeHTMLTagsUsingElement(input);
}
</script>
</head>
<body>
<h2>Remove HTML Tags Example</h2>
<textarea id="htmlInput" rows="4" cols="50"><p>Intelli <strong>paat</strong>!</p></textarea>
<br>
<button onclick="processText()">Remove Tags</button>
<h3>Output:</h3>
<p id="output"></p>
</body>
</html>
Output:
Explanation: The temporary div element is created using this code to accumulate the HTML tag and display the plain text when you click on the Remove Tags button.
Conclusion
The above-mentioned methods are the most efficient way to remove HTML tags from a string using JavaScript. You can use the replace() function, DOMParser, innerText, or textContent with documents to remove the HTML tags. You should remove the HTML tags to protect against the cross-site attack, display the plain content by avoiding HTML tags, and make the website user-friendly.
1. What is the best way to remove HTML tags in JavaScript?
Using DOM is the safe and reliable way, and regular expression provides you results in a short time. Therefore, the best method completely depends on your needs.
2. How can I remove HTML tags using a regular expression?
You can use the regular expression to match and change the string based on the pattern. This allows you to identify and remove the HTML tags from the string.
3. Is it safe to use regular expressions to remove HTML tags?
No, regex can be used but it is not safe all the time, especially when HTML comes from an untrusted source.
4. How can I remove HTML tags using DOMParser?
DOMParser is a built-in JavaScript object that is used to remove HTML tags and extract plain text.
5. Are there any libraries available for removing HTML tags?
Yes, sanitize-html is the library that is used to remove the HTML tag and provide you with text content.