0 votes
1 view
in SQL by (22.4k points)

I've got data in SQL Server 2005 that contains HTML tags and I'd like to strip all that out, leaving just the text between the tags. Ideally also replacing things like &lt; with <, etc.

Is there an easy way to do this or has someone already got some sample t-SQL code?

I don't have the ability to add extended stored procs and the like, so I would prefer a pure t-SQL approach (preferably one backward compatible with SQL 2000).

I just want to retrieve the data with stripped out HTML, not update it, so ideally it would be written as a user-defined function, to make for easy reuse.

So, for example, converting this:

<B>Some useful text</B>&nbsp;

<A onclick="return openInfo(this)"

   href="http://there.com/3ce984e88d0531bac5349"

   target=globalhelp>

   <IMG title="Source Description" height=15 alt="Source Description" 

        src="/ri/new_info.gif" width=15 align=top border=0>

</A>&gt;&nbsp;<b>more text</b></TD></TR>

to this:

Some useful text > more text

1 Answer

0 votes
by (40.3k points)

User-Defined Function to Strip HTML is as follows:

CREATE FUNCTION [dbo].[udf_StripHTML] (@HTMLText VARCHAR(MAX))

RETURNS VARCHAR(MAX) AS

BEGIN

    DECLARE @Start INT

    DECLARE @End INT

    DECLARE @Length INT

    SET @Start = CHARINDEX('<',@HTMLText)

    SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))

    SET @Length = (@End - @Start) + 1

    WHILE @Start > 0 AND @End > 0 AND @Length > 0

    BEGIN

        SET @HTMLText = STUFF(@HTMLText,@Start,@Length,'')

        SET @Start = CHARINDEX('<',@HTMLText)

        SET @End = CHARINDEX('>',@HTMLText,CHARINDEX('<',@HTMLText))

        SET @Length = (@End - @Start) + 1

    END

    RETURN LTRIM(RTRIM(@HTMLText))

END

GO

Note: For SQL Server 2005, but if you are trying to change the keyword MAX to something like 4000, then it will work in SQL Server 2000 as well.

Welcome to Intellipaat Community. Get your technical queries answered by top developers !


Categories

...