How to Split a Comma-separated Value (CSV) into Columns in SQL?

How to Split a Comma-separated Value (CSV) into Columns in SQL?

When there is a large number of strings with different attributes, splitting them into columns will be helpful. This makes the data organized, and finding the values in the column will be much easier. In this blog, let’s explore multiple approaches to splitting comma-separated values into columns in SQL with examples for each.

Table of Contents:

Methods to Split a Comma-separated Value (CSV) into Columns in SQL

There are methods like String-split, row_number, and OpenJSON with the clause. These methods will be helpful to convert comma-separated values into columns.

Method 1: STRING_AGG with CTE in SQL 

These methods use the String_Split method to break CSV into rows, then CTE assigns the row numbers.

Example:

DECLARE @csv NVARCHAR(MAX) = 'Pebble,Stone,Rock';
WITH CTE AS (
    SELECT value, ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS rn
    FROM STRING_SPLIT(@csv, ',')
)
SELECT STRING_AGG(value, ' | ') AS Result
FROM CTE;

Output:

STRING_AGG with CTE in SQL 

Explanation: This method converted the strings into columns.

Method 2: Using OPENJSON() in SQL

The OPENJSON is a built-in function in SQL Server. This converts the JSON-formatted strings into columns. This is used to transform JSON objects and arrays into tabular format, which makes the data easily processed. 

Example:

DECLARE @json NVARCHAR(MAX) = '[{"Col1":"Apple", "Col2":"Red", "Col3":"Sweet"},
                               {"Col1":"Banana", "Col2":"Yellow", "Col3":"Sweet"},
                               {"Col1":"Lemon", "Col2":"Yellow", "Col3":"Sour"}]';
SELECT *
FROM OPENJSON(@json)
WITH (
    Col1 NVARCHAR(50) '$.Col1',
    Col2 NVARCHAR(50) '$.Col2',
    Col3 NVARCHAR(50) '$.Col3'
);

Output:

Using OPENJSON() in SQL

Explanation: The input has been given in the form of JSON arrays and then converted into a tabular column with their specific characteristics. Here, the apple string is red, and it is sweet. You can see that the array has been assembled in columns concerning their characteristics. 

Method 3: Using Dynamic SQL with STRING_SPLIT & PIVOT

This method uses STRING_AGG to fetch the records, STRING_SPLIT function converts csv to rows after storing records dynamically the PIVOT function converts rows to columns.   

Example:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    EmployeeName NVARCHAR(100),
    ManagerID INT,
    FOREIGN KEY (ManagerID) REFERENCES Employees(EmployeeID)
);
INSERT INTO Employees (EmployeeID, EmployeeName, ManagerID)
VALUES 
    (1, 'Alice', NULL), 
    (2, 'Bob', 1),     
    (3, 'Charlie', 1),   
    (4, 'David', 2),       
    (5, 'Eve', 2);
DECLARE @csv NVARCHAR(MAX);

-- Get CSV of Employee Names
SELECT @csv = STRING_AGG(EmployeeName, ',') FROM Employees;

-- Create Temporary Table for Splitting
IF OBJECT_ID('tempdb..#TempTable') IS NOT NULL DROP TABLE #TempTable;
CREATE TABLE #TempTable (
    ColIndex INT,
    Value NVARCHAR(100)
);

-- Insert Data into Temporary Table
INSERT INTO #TempTable (ColIndex, Value)
SELECT ROW_NUMBER() OVER (ORDER BY (SELECT NULL)) AS ColIndex, value
FROM STRING_SPLIT(@csv, ',');

-- Generate Dynamic Column Names
DECLARE @cols NVARCHAR(MAX) = '';
SELECT @cols += '[Col' + CAST(ColIndex AS NVARCHAR) + '], ' FROM #TempTable;
SET @cols = LEFT(@cols, LEN(@cols) - 1); -- Remove trailing comma

-- Generate Dynamic Pivot Query
DECLARE @query NVARCHAR(MAX) = '
SELECT ' + @cols + ' FROM 
(SELECT Value, ''Col'' + CAST(ColIndex AS NVARCHAR) AS ColName FROM #TempTable) AS SourceData
PIVOT (MAX(Value) FOR ColName IN (' + @cols + ')) AS PivotTable;';

-- Execute Dynamic SQL
EXEC sp_executesql @query;

-- Cleanup
DROP TABLE #TempTable;

Output:

Using Dynamic SQL with STRING_SPLIT & PIVOT

Explanation: We created a database of employees with manager IDs, with their names the string_split, PIVOT, converted CSV to columns dynamically.

Method 4: Using SUBSTRING_INDEX () in SQL 

The substring_Index is a string function that converts delimited strings into separate columns, but the drawback of this method is that we need to do everything manually, like how many columns we need, and you need to use multiple calls to each part of the string to convert them as columns. 

Example:

SELECT 
 -- Extracts 'Apple'
 SUBSTRING_INDEX(csv, ',', 1) AS Col1,   

 -- Extracts 'Banana'
SUBSTRING_INDEX(SUBSTRING_INDEX(csv, ',', 2), ',', -1) AS Col2,   

  -- Extracts 'Cherry'
SUBSTRING_INDEX(SUBSTRING_INDEX(csv, ',', 3), ',', -1) AS Col3
FROM (SELECT 'Apple,Banana,Cherry' AS csv) AS t;

Output:

Using SUBSTRING_INDEX () in SQL

Explanation: We called each string and converted them into columns. 

Method 5: Using UNNEST ( ) with STRING_TO_ARRAY ( ) in SQL

The STRING_TO_ARRAY is used to split the strings into arrays, and then UNNEST will expand the array element into rows.

Example:

CREATE TABLE AnimalOffspring (
    Animal_Offspring TEXT
);

INSERT INTO AnimalOffspring (Animal_Offspring) VALUES
('Lion,Cub'),
('Elephant,Calf'),
('Dog,Puppy'),
('Cat,Kitten'),
('Cow,Calf');
SELECT 
    (STRING_TO_ARRAY(Animal_Offspring, ','))[1] AS Animal,
    (STRING_TO_ARRAY(Animal_Offspring, ','))[2] AS Offspring 
FROM AnimalOffspring;

Output:

Using UNNEST ( ) with STRING_TO_ARRAY ( ) in SQL

Explanation: The string_to_array converted all the strings into an array, then it changed from rows to columns.

Method 6: Using REGEXP_MATCHES () in SQL

The REGEXP_SUBSTR is a function that extracts substrings from comma-separated values (CSV) into columns. REGEXP_MATCHES is a powerful tool that can be used in PostgreSQL Server. 

Example:

CREATE TABLE StudentSubjects (
    Student_Info VARCHAR(100)
);
INSERT INTO StudentSubjects (Student_Info) VALUES ('John,Math,85');
INSERT INTO StudentSubjects (Student_Info) VALUES ('Alice,English,90');
INSERT INTO StudentSubjects (Student_Info) VALUES ('Bob,Science,75');
SELECT 
    (REGEXP_MATCHES(Student_Info, '^([^,]+)', 'g'))[1] AS Student_Name,
    (REGEXP_MATCHES(Student_Info, ',([^,]+),', 'g'))[1] AS Subject,
    (REGEXP_MATCHES(Student_Info, '([^,]+)$', 'g'))[1] AS Marks
FROM StudentSubjects;

Output:

Using REGEXP_MATCHES () in SQL

Explanation: The REGEXP_MATCHES will extract the substring, and then it will create a column based on the element. 

Method 7: Using STRING_SPLIT() with CROSS APPLY in SQL Server

The STRING_SPLIT() method in SQL Server splits the comma-separated values into multiple columns and cross apply helps with data stored in a table.

Example:

CREATE TABLE Employees (
    EmployeeID INT PRIMARY KEY,
    EmployeeName NVARCHAR(50),
    Skills NVARCHAR(MAX) -- CSV formatted skills
);
INSERT INTO Employees (EmployeeID, EmployeeName, Skills) VALUES
(1, 'Karan', 'SQL,Python, Excel'),
(2, 'Sara', 'Java,SQL,C#'),
(3, 'Charan', 'Python ,PowerBI');
SELECT EmployeeID, EmployeeName, value AS Skill
FROM Employees
CROSS APPLY STRING_SPLIT(Skills, ',');

Output:

Using STRING_SPLIT() with CROSS APPLY in SQL Server

Explanation: The STRING_SPLIT() and cross apply command in SQL retrieved all the data that were present in the string and converted it into columns. 

Performance Consideration

MethodsPerformance Best Use CaseLimitations
STRING_AGG with CTEPerformance will be higher for small and medium data.It is best when we use the aggregate function and re-split data.It cannot be used for large datasets.
OPENJSON()Performs well only on JSON files.Best case when structured or nested data are used. It requires JSON file format to work.
Dynamic SQL with STRING_SPLIT & PIVOTWorks well for small and medium datasets. When converting CSV values into columns, dynamic reporting.Lacks efficient indexing, and STRING_SPLIT does not guarantee ordered output.
SUBSTRING_INDEX()Will be faster on a fixed number of strings.When the fixed splitting position is decided.Only performs when the count of elements is known.
UNNEST() with STRING_TO_ARRAY()Will perform very efficiently with datasets.Best case when simple csv to row conversion.It doesn’t have built-in ordering. 
REGEXP_MATCHES()Will perform slower as it has to perform regex overheadPerform well when there is a complex pattern.It can’t perform well with large datasets.
STRING_SPLIT() with cross applyWill perform faster with simple splits.For easy and simple csv to row conversion.It cannot perform ordering without definite conditions. 

Real-world Use cases 

Case 1: Using the string_to_array and split_part methods to fetch the order details:

Example:

CREATE TABLE Orders (
    OrderID INT PRIMARY KEY,
    OrderDetails TEXT -- Storing CSV as text
);
INSERT INTO Orders (OrderID, OrderDetails) VALUES
(101, 'Laptop, Mouse, Bag'),
(102, 'Phone, Charger, Gun'),
(103, 'Keyboard, Piano');
SELECT OrderID, unnest(string_to_array(OrderDetails, ',')) AS Item,
       split_part(OrderDetails, ',', 1) AS Item1,
       split_part(OrderDetails, ',', 3) AS Item2,
       split_part(OrderDetails, ',', 2) AS Item3
FROM Orders;

Output:

Real-world Use cases 1

Explanation: The string_to_array and split_part converted the list of orders into columns. The items are separated based on their position in the list. Like here, Bag and Gun are in third place on the list. Item number 2 consists of orders in third place.

Case 2: Using cross-apply to get the timestamp of events 

Example:

CREATE TABLE Logs (
    LogID INT PRIMARY KEY,
    Timestamps NVARCHAR(MAX) -- Storing CSV timestamps as text
);

INSERT INTO Logs (LogID, Timestamps) VALUES
(1, '2024-02-19,2024-02-20,2024-02-21'),
(2, '2024-02-22,2024-02-23');
SELECT LogID, value AS EventTimestamp
FROM Logs
CROSS APPLY STRING_SPLIT(Timestamps, ',');

Output:

Real-world Use cases 2

Explanation: The cross-apply string split converted the string of timestamps into columns. 

Conclusion

The splitting of a comma-separated value (CSV) into columns in SQL can be done by different methods depending on the database server. The methods like STRING_SPLIT, ROW_NUMBER, and OPENJSON are used to convert CSV into columns. The string functions like SUBSTRING_INDEX, STRING_TO_ARRAY The methods can be chosen based on the performance needs and complexity of the data. 

FAQs

1. When should I use STRING_AGG with CROSS APPLY?

When you need to split and re-aggregate data or combine rows into a single CSV.

2. Why use recursive CTEs for splitting data?

To handle dynamic or unknown-length CSVs while maintaining element positions.

3. What is the advantage of using UNNEST with STRING_TO_ARRAY in PostgreSQL?

It efficiently converts CSV strings into rows using simple syntax for analysis.

About the Author

Data Engineer

As a skilled Data Engineer, Sahil excels in SQL, NoSQL databases, Business Intelligence, and database management. He has contributed immensely to projects at companies like Bajaj and Tata. With a strong expertise in data engineering, he has architected numerous solutions for data pipelines, analytics, and software integration, driving insights and innovation.

business intelligence professional