Sometimes I come across situations where I need to extract the individual parts of a full name from a database record that relates to a person e.g. a customer or an employee.
This may be part of general application logic or perhaps something that needs to be carried out during a database migration.
For example, perhaps the initial software requirements were met by a simple full name or account name field, but now the individual parts of the name need to be stored separately to make life easier when mail merging or producing marketing material.
To achieve this we can turn to SQL to split up a full name into its constituent parts.
There are a few SQL string functions we can avail of to get the job done, as follows.
Returns the length of the specified string. Requires a single
SELECT LEN('Jonathan Crozier')
Returns the index of the first occurrence of a specified character. Requires
SELECT CHARINDEX('C', 'Jonathan Crozier', 0)
Returns part of a string based on a start index and length parameter. Requires
SELECT SUBSTRING('Jonathan Crozier', 10, 7)
Removes whitespace from the left side of the specified string. Requires a single
SELECT LTRIM(' Jonathan Crozier')
Result: Jonathan Crozier
i.e. the 3 space characters at the start of the string have been removed.
Removes whitespace from the right side of the specified string. Requires a single
SELECT RTRIM('Jonathan Crozier ')
Result: Jonathan Crozier
i.e. the 3 space characters at the end of the string have been removed.
By using a combination of the above SQL functions in a script we can figure out how to split up a full name into Title, First Name and Last Name fields with a fairly high degree of accuracy.
The SQL for achieving a name split into the aforementioned fields is included below.
-- Parse the individual name parts from the full name of each customer record. -- Update customer records for which a Title, First Name and Last Name have not already been set. SET NOCOUNT ON GO -- Customer table column values. DECLARE @CustomerId INT; DECLARE @CustomerAccountName NVARCHAR(108); DECLARE @CustomerTitle NVARCHAR(8); DECLARE @CustomerFirstName NVARCHAR(50); DECLARE @CustomerLastName NVARCHAR(50); -- Initialise cursor. DECLARE CustomersCursor CURSOR LOCAL STATIC READ_ONLY FORWARD_ONLY FOR SELECT DISTINCT CustomerKey, AccountName, Title, FirstName, LastName FROM DimCustomer OPEN CustomersCursor FETCH NEXT FROM CustomersCursor INTO @CustomerId, @CustomerAccountName, @CustomerTitle, @CustomerFirstName, @CustomerLastName WHILE @@FETCH_STATUS = 0 BEGIN -- Temporary variables. DECLARE @Title NVARCHAR(8); DECLARE @FirstName NVARCHAR(50); DECLARE @LastName NVARCHAR(50); -- Index and word variables. DECLARE @AccountNameEndIndex INT; DECLARE @FirstWordEndIndex INT; DECLARE @SecondWordEndIndex INT; DECLARE @ThirdWordEndIndex INT; DECLARE @FirstWord NVARCHAR(8); DECLARE @SecondWord NVARCHAR(50); DECLARE @ThirdWord NVARCHAR(50); -- Get the last index. SET @AccountNameEndIndex = LEN(@CustomerAccountName) - 1; -- Get the first word. SET @FirstWordEndIndex = (SELECT CHARINDEX(' ', @CustomerAccountName, 0)); SET @FirstWord = SUBSTRING(@CustomerAccountName, 0, @FirstWordEndIndex); -- Get the second word. SET @SecondWordEndIndex = (SELECT CHARINDEX(' ', @CustomerAccountName, @FirstWordEndIndex + 1)); IF @SecondWordEndIndex = 0 BEGIN SELECT @SecondWordEndIndex = @AccountNameEndIndex; SET @SecondWord = SUBSTRING(@CustomerAccountName, @FirstWordEndIndex + 1, @SecondWordEndIndex - (@FirstWordEndIndex - 1)); END ELSE BEGIN SET @SecondWord = SUBSTRING(@CustomerAccountName, @FirstWordEndIndex + 1, @SecondWordEndIndex - (@FirstWordEndIndex + 1)); END -- Get the third word. SET @ThirdWordEndIndex = @AccountNameEndIndex; IF @ThirdWordEndIndex <> @SecondWordEndIndex BEGIN SET @ThirdWord = SUBSTRING(@CustomerAccountName, @SecondWordEndIndex + 1, @ThirdWordEndIndex - (@SecondWordEndIndex - 1)); END -- Check if the first word is a 'Title'. SELECT @Title = CASE LTRIM(RTRIM(@FirstWord)) WHEN 'Dr' THEN 'Dr' WHEN 'Dr.' THEN 'Dr.' WHEN 'Miss' THEN 'Miss' WHEN 'Miss.' THEN 'Miss.' WHEN 'Mr' THEN 'Mr' WHEN 'Mr.' THEN 'Mr.' WHEN 'Mrs' THEN 'Mrs' WHEN 'Mrs.' THEN 'Mrs.' WHEN 'Ms' THEN 'Ms' WHEN 'Ms.' THEN 'Ms.' WHEN 'Rev' THEN 'Rev' WHEN 'Rev.' THEN 'Rev.' ELSE '' END IF @Title = '' BEGIN IF LTRIM(RTRIM(@FirstWord)) = '' BEGIN -- There is only one word, use this as the First Name. SELECT @FirstName = LTRIM(RTRIM(@SecondWord)); SELECT @LastName = ''; END ELSE BEGIN -- There are two words, set First Name and Last Name. SELECT @FirstName = LTRIM(RTRIM(@FirstWord)); SELECT @LastName = LTRIM(RTRIM(@SecondWord)); END END ELSE BEGIN -- There are three words, set First Name and Last Name. -- Title has already been set. SELECT @FirstName = LTRIM(RTRIM(@SecondWord)); SELECT @LastName = LTRIM(RTRIM(@ThirdWord)); END -- Output the combination of the name parts for reference. PRINT @Title + ' ' + @FirstName + ' ' + @LastName; -- Update the current Customer values. IF (LEN(ISNULL(@CustomerTitle , '')) = 0) AND (LEN(ISNULL(@CustomerFirstName, '')) = 0) AND (LEN(ISNULL(@CustomerLastName , '')) = 0) BEGIN UPDATE DimCustomer SET Title = @Title, FirstName = @FirstName, LastName = @LastName WHERE CustomerKey = @CustomerId; END FETCH NEXT FROM CustomersCursor INTO @CustomerId, @CustomerAccountName, @CustomerTitle, @CustomerFirstName, @CustomerLastName END CLOSE CustomersCursor DEALLOCATE CustomersCursor
Note that you must adjust the script to suit your database schema before running it. Always back up your database before running a script for the first time and always test it on a non-production database first.
The SQL uses a
CURSOR to temporarily store and iterate through the customer set. While iterating through the results the first, second and third words are parsed from the full name. The first word is then checked to see if it is a title. The individual name part variables are then set and the customer record is updated if required.
Note that the title check, of course, isn’t a perfect solution and depends on the titles being input in a consistent manner.
Before running the script, I carried out some reverse engineering in the standard AdventureWorks database.
I added an ‘AccountName’ column and populated it based on a combination of the ‘Title’, ‘FirstName’ and ‘LastName’ columns.
After running the script and executing a select query against the ‘DimCustomer’ table, the sample output looks like the screenshot below.
As you can see, the script has accurately parsed the individual name parts from the ‘AccountName’ column.
In cases where there are more than three words to split, the additional words will be stored in the ‘LastName’ column.
Adapting the Script
It’s always useful to have a script like this available when the need for it arises.
Aside from column types and lengths, the initial SELECT and closing UPDATE statements are the main parts of the script that need to be changed on a case by case basis.
The script could also be extended to accommodate slightly more complex requirements.
For example, if a ‘MiddleNames’ column needed to be populated. In this case, the logic could be reworked to store any additional name parts in the ‘MiddleNames’ column instead of the ‘LastName’ column which would be reserved for the last word.
The script from this article is part of my SQL Scripts GitHub Repository where you can find other useful SQL scripts which will help you to solve other problems and save time!