If you’ve ever worked with SQL Server and found the need to concatenate data from table rows into a string value, you may remember having to resort to using features like FOR XML PATH
in combination with STUFF
to achieve the desired result.
While these approaches are functional, they tend to result in code that is both clunky and hard to read.
Thankfully, starting with SQL Server 2017, Microsoft introduced the STRING_AGG
function to the T-SQL language, providing a simpler and much more elegant way to aggregate row data into a string.
In this article, I will explore the basics of how to use STRING_AGG
and look at a few practical ways you can use it in your SQL queries, as well as consider some nuances of the function.
What is STRING_AGG?
STRING_AGG
is a T-SQL function, introduced in SQL Server 2017, that concatenates values from multiple table rows into a single string. It allows a delimiter to be specified, which is perfect when you want to create comma-separated lists or a custom format according to some specific requirements.
The STRING_AGG
function provides similar functionality to the GROUP_CONCAT
function which you’ll find in other SQL database engines such as MySQL.
Syntax
Here is the official representation of the STRING_AGG
syntax, as per the Microsoft Docs.
STRING_AGG ( expression, separator ) [ <order_clause> ] <order_clause> ::= WITHIN GROUP ( ORDER BY <order_by_expression_list> [ ASC | DESC ] )
Let’s look at the required function parameters first and then circle back to an example of the order clause later.
expression
The column or expression whose values you want to aggregate.
delimiter
The character or string that should be placed between the aggregated values.
Example
Let’s consider a very basic example of using the STRING_AGG
function.
Imagine we have a ‘Customers’ table in our database containing an ‘AccountName’ column and we run a simple SELECT
statement such as the following.
SELECT AccountName FROM Customers;
The results of the above query would look something like the following.
NexaCore Innovations |
QuantumEdge Systems |
HyperNova Technologies |
CircuitSphere Solutions |
CodePulse Dynamics |
Next, imagine we were to run the following SQL on the same data set.
SELECT STRING_AGG(AccountName, ', ') FROM Customers;
This would result in the following output.
NexaCore Innovations, QuantumEdge Systems, HyperNova Technologies, CircuitSphere Solutions, CodePulse Dynamics |
As you can see, the STRING_AGG
function essentially aggregates the vertical list of separate row column values into a horizontal list where the values are combined into a single string.
STRING_AGG examples
Let’s look at some more examples of using the STRING_AGG
function.
Before doing so, if you want to follow along, I recommend that you install the latest version of SQL Server or LocalDB (usually you will already have access to LocalDB if you have Visual Studio installed) and the latest version of SQL Server Management Studio (SSMS) or Azure Data Studio for interacting with the SQL Server database.
I’ll be working with the sample AdventureWorks Data Warehouse database that is frequently used for demos.
Below is an example of running a similar query to the one we looked at in the previous section, however, this time I’m running the query against the AdventureWorks database which has a ‘DimCustomer’ table.
You can see that the query returns a comma-separated string containing Customer First Names.
The SQL query has been included below if you want to copy and paste it.
SELECT STRING_AGG(FirstName, ', ') AS [Customer First Names] FROM dbo.DimCustomer WHERE CustomerKey BETWEEN 11000 AND 11005;
A column alias has been specified to provide a clear heading for the result set.
Overcoming limitations
In the SQL query example included just before this subsection, you’ll notice that a simple WHERE
clause condition has been included to limit the results to 5 customers. Without this condition, the following error will be returned.
STRING_AGG aggregation result exceeded the limit of 8000 bytes. Use LOB types to avoid result truncation.
As the message text indicates, this error occurs when the concatenated string result is too long. By default, STRING_AGG
produces a VARCHAR
result which is limited to 8,000 bytes.
To get around this limitation, we can CAST
to a LOB (Large Object Type) to accommodate longer results.
SELECT STRING_AGG(CAST(FirstName AS NVARCHAR(MAX)), ', ') AS [All Customer First Names] FROM dbo.DimCustomer;
This allows us to combine all of the customer first names into a single string, with practically no limitations aside from the huge 2 GB upper limit of NVARCHAR(MAX)
.
As you can see from the above screenshot, the Results pane now displays a long list of all customer first names in a single string within the row that is returned by the query.
Ordering
By default, the STRING_AGG
function doesn’t guarantee that any particular order is applied to the concatenated values. However, it is possible to enforce an order using the WITHIN GROUP
clause.
SELECT STRING_AGG(FirstName, ', ') WITHIN GROUP (ORDER BY FirstName) AS [Ordered Customer First Names] FROM dbo.DimCustomer WHERE CustomerKey BETWEEN 11000 AND 11005;
One or more criteria can be specified to order the results. In this case, we are simply ordering by the ‘FirstName’ column.
As you can see from the screenshot, the results are now ordered alphabetically in ascending order based on the first name of the customers.
Grouped queries
The STRING_AGG
function is particularly useful in grouped queries.
Consider a more complex query such as the following which displays a list of a list of orders grouped by customer, sorted by the highest customer sales in descending order.
SELECT MIN(CONCAT(dc.FirstName, ' ', dc.LastName)) AS [Customer Name], STRING_AGG(CONCAT(fis.SalesOrderNumber, ' ($', fis.SalesAmount, ')'), ', ') WITHIN GROUP (ORDER BY fis.DueDate DESC, fis.SalesAmount DESC) AS [Orders] FROM dbo.FactInternetSales fis JOIN dbo.DimCustomer dc ON fis.CustomerKey = dc.CustomerKey GROUP BY dc.CustomerKey ORDER BY SUM(fis.SalesAmount) DESC;
In the above query, the results are selected from the ‘FactInternetSales’ table, which is inner-joined to the ‘DimCustomer’ table via the ‘CustomerKey’ columns. The results are grouped by the Customer Key and are ordered by the sum of the ‘SalesAmount’ column from the sales table in descending order, such that customers with the highest total sales amount are displayed at the top of the results.
The STRING_AGG
function aggregates the row data for each group i.e. each customer in this case. The STRING_AGG
order clause sorts the data by the Due Date and Sales Amount in descending order, such that orders which are due soonest and are most valuable appear first.
Note that this query has been designed to demonstrate STRING_AGG
concepts and is not intended to reflect an accurate picture of customer orders that would necessarily be useful in real-world scenarios.
The above concept can be extended further e.g. in cases where you are joining multiple sub-tables, the STRING_AGG
order clause can be updated to sort data consistently based on values from each joined table.
Handling NULL values
By default, STRING_AGG
ignores NULL
values when aggregating row data. If you want to consider the NULL
values, you can use the ISNULL
or COALESCE
functions.
SELECT STRING_AGG(COALESCE(MiddleName, '?'), ', ') AS [Customer Middle Names] FROM dbo.DimCustomer WHERE CustomerKey BETWEEN 11000 AND 11005;
In the above query, the COALESCE
function will return the value of the ‘MiddleName’ column if it is not NULL
, otherwise, it will return a question mark character.
In the Results pane, you can see that the question mark characters are appearing in place of NULL
. Without the use of the COALESCE
function, the resulting string would simply be: V, L
Summary
The STRING_AGG
function is a very useful addition to the T-SQL language, making the process of concatenating row data into a string much more intuitive. STRING_AGG
allows you to write cleaner code that is usually more performant than other solutions such as FOR XML PATH
.
Whether you’re building reports, extracting data, or generating dynamic content, STRING_AGG
is a great tool to have at your disposal when you are working with SQL Server databases.
Comments