Using the STRING_AGG function in T-SQL to concatenate row data

If you’ve ever worked with SQL Server and found the need to concatenate data from table rows into a string value, you may remember having to resort to using features like FOR XML PATH in combination with STUFF to achieve the desired result.

While these approaches are functional, they tend to result in code that is both clunky and hard to read.

Thankfully, starting with SQL Server 2017, Microsoft introduced the STRING_AGG function to the T-SQL language, providing a simpler and much more elegant way to aggregate row data into a string.

In this article, I will explore the basics of how to use STRING_AGG and look at a few practical ways you can use it in your SQL queries, as well as consider some nuances of the function.

What is STRING_AGG?

STRING_AGG is a T-SQL function, introduced in SQL Server 2017, that concatenates values from multiple table rows into a single string. It allows a delimiter to be specified, which is perfect when you want to create comma-separated lists or a custom format according to some specific requirements.

The STRING_AGG function provides similar functionality to the GROUP_CONCAT function which you’ll find in other SQL database engines such as MySQL.

Syntax

Here is the official representation of the STRING_AGG syntax, as per the Microsoft Docs.

STRING_AGG ( expression, separator ) [ <order_clause> ]
 
<order_clause> ::=   
    WITHIN GROUP ( ORDER BY <order_by_expression_list> [ ASC | DESC ] )

Let’s look at the required function parameters first and then circle back to an example of the order clause later.

expression

The column or expression whose values you want to aggregate.

delimiter

The character or string that should be placed between the aggregated values.

Example

Let’s consider a very basic example of using the STRING_AGG function.

Imagine we have a ‘Customers’ table in our database containing an ‘AccountName’ column and we run a simple SELECT statement such as the following.

SELECT AccountName
FROM Customers;

The results of the above query would look something like the following.

NexaCore Innovations
QuantumEdge Systems
HyperNova Technologies
CircuitSphere Solutions
CodePulse Dynamics

Next, imagine we were to run the following SQL on the same data set.

SELECT STRING_AGG(AccountName, ', ')
FROM Customers;

This would result in the following output.

NexaCore Innovations, QuantumEdge Systems, HyperNova Technologies, CircuitSphere Solutions, CodePulse Dynamics

As you can see, the STRING_AGG function essentially aggregates the vertical list of separate row column values into a horizontal list where the values are combined into a single string.

STRING_AGG examples

Let’s look at some more examples of using the STRING_AGG function.

Before doing so, if you want to follow along, I recommend that you install the latest version of SQL Server or LocalDB (usually you will already have access to LocalDB if you have Visual Studio installed) and the latest version of SQL Server Management Studio (SSMS) or Azure Data Studio for interacting with the SQL Server database.

I’ll be working with the sample AdventureWorks Data Warehouse database that is frequently used for demos.

Below is an example of running a similar query to the one we looked at in the previous section, however, this time I’m running the query against the AdventureWorks database which has a ‘DimCustomer’ table.

STRING_AGG AdventureWorks query
STRING_AGG AdventureWorks query

You can see that the query returns a comma-separated string containing Customer First Names.

The SQL query has been included below if you want to copy and paste it.

SELECT STRING_AGG(FirstName, ', ') AS [Customer First Names]
FROM dbo.DimCustomer
WHERE CustomerKey BETWEEN 11000 AND 11005;

A column alias has been specified to provide a clear heading for the result set.

Overcoming limitations

In the SQL query example included just before this subsection, you’ll notice that a simple WHERE clause condition has been included to limit the results to 5 customers. Without this condition, the following error will be returned.

STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.

As the message text indicates, this error occurs when the concatenated string result is too long. By default, STRING_AGG produces a VARCHAR result which is limited to 8,000 bytes.

To get around this limitation, we can CAST to a LOB (Large Object Type) to accommodate longer results.

SELECT STRING_AGG(CAST(FirstName AS NVARCHAR(MAX)), ', ') AS [All Customer First Names]
FROM dbo.DimCustomer;

This allows us to combine all of the customer first names into a single string, with practically no limitations aside from the huge 2 GB upper limit of NVARCHAR(MAX).

STRING_AGG with cast to NVARCHAR(MAX)
STRING_AGG with cast to NVARCHAR(MAX)

As you can see from the above screenshot, the Results pane now displays a long list of all customer first names in a single string within the row that is returned by the query.

Ordering

By default, the STRING_AGG function doesn’t guarantee that any particular order is applied to the concatenated values. However, it is possible to enforce an order using the WITHIN GROUP clause.

SELECT STRING_AGG(FirstName, ', ') WITHIN GROUP (ORDER BY FirstName) AS [Ordered Customer First Names]
FROM dbo.DimCustomer
WHERE CustomerKey BETWEEN 11000 AND 11005;

One or more criteria can be specified to order the results. In this case, we are simply ordering by the ‘FirstName’ column.

STRING_AGG with order clause
STRING_AGG with order clause

As you can see from the screenshot, the results are now ordered alphabetically in ascending order based on the first name of the customers.

Grouped queries

The STRING_AGG function is particularly useful in grouped queries.

Consider a more complex query such as the following which displays a list of a list of orders grouped by customer, sorted by the highest customer sales in descending order.

SELECT
    MIN(CONCAT(dc.FirstName, ' ', dc.LastName)) AS [Customer Name],
    STRING_AGG(CONCAT(fis.SalesOrderNumber, ' ($', fis.SalesAmount, ')'), ', ') 
	WITHIN GROUP (ORDER BY fis.DueDate DESC, fis.SalesAmount DESC)
    AS [Orders]
FROM dbo.FactInternetSales fis
JOIN dbo.DimCustomer dc ON fis.CustomerKey = dc.CustomerKey
GROUP BY dc.CustomerKey
ORDER BY SUM(fis.SalesAmount) DESC;

In the above query, the results are selected from the ‘FactInternetSales’ table, which is inner-joined to the ‘DimCustomer’ table via the ‘CustomerKey’ columns. The results are grouped by the Customer Key and are ordered by the sum of the ‘SalesAmount’ column from the sales table in descending order, such that customers with the highest total sales amount are displayed at the top of the results.

STRING_AGG in grouped query
STRING_AGG in grouped query

The STRING_AGG function aggregates the row data for each group i.e. each customer in this case. The STRING_AGG order clause sorts the data by the Due Date and Sales Amount in descending order, such that orders which are due soonest and are most valuable appear first.

Note that this query has been designed to demonstrate STRING_AGG concepts and is not intended to reflect an accurate picture of customer orders that would necessarily be useful in real-world scenarios.

The above concept can be extended further e.g. in cases where you are joining multiple sub-tables, the STRING_AGG order clause can be updated to sort data consistently based on values from each joined table.

Handling NULL values

By default, STRING_AGG ignores NULL values when aggregating row data. If you want to consider the NULL values, you can use the ISNULL or COALESCE functions.

SELECT STRING_AGG(COALESCE(MiddleName, '?'), ', ') AS [Customer Middle Names]
FROM dbo.DimCustomer
WHERE CustomerKey BETWEEN 11000 AND 11005;

In the above query, the COALESCE function will return the value of the ‘MiddleName’ column if it is not NULL, otherwise, it will return a question mark character.

STRING_AGG handling NULL
STRING_AGG handling NULL

In the Results pane, you can see that the question mark characters are appearing in place of NULL. Without the use of the COALESCE function, the resulting string would simply be: V, L

Summary

The STRING_AGG function is a very useful addition to the T-SQL language, making the process of concatenating row data into a string much more intuitive. STRING_AGG allows you to write cleaner code that is usually more performant than other solutions such as FOR XML PATH.

Whether you’re building reports, extracting data, or generating dynamic content, STRING_AGG is a great tool to have at your disposal when you are working with SQL Server databases.


I hope you enjoyed this post! Comments are always welcome and I respond to all questions.

If you like my content and it helped you out, please check out the button below 🙂

Comments