How to parse the name parts from a full name using SQL

Sometimes I come across situations where I need to extract the individual parts of a full name from a database record which relates to a person e.g. a customer or an employee.

This may be part of general application logic or perhaps something that needs to be carried out during a database migration.

For example, perhaps the initial software requirements were met by a simple full name or account name field, but now the individual parts of the name need to be stored separately to make life easier when mail merging or producing marketing material.

To achieve this we can turn to SQL to split up a full name into its constituent parts.

SQL Functions

There are a few SQL string functions we can avail of to get the job done, as follows.

LEN

Returns the length of the specified string. Requires a single expression argument.

e.g. SELECT LEN('Jonathan Crozier')

Result: 16

CHARINDEX

Returns the index of the first occurrence of a specified character. Requires search_expression, expression_to_be_searched and start_location arguments.

e.g. SELECT CHARINDEX('C', 'Jonathan Crozier', 0)

Result: 10

SUBSTRING

Returns part of a string based on a start index and length parameter. Requires expression, starting_position and length arguments.

e.g. SELECT SUBSTRING('Jonathan Crozier', 10, 7)

Result: Crozier

LTRIM

Removes whitespace from the left side of the specified string. Requires a single expression argument.

e.g. SELECT LTRIM('   Jonathan Crozier')

Result: Jonathan Crozier

i.e. the 3 space characters at the start of the string have been removed.

RTRIM

Removes whitespace from the right side of the specified string. Requires a single expression argument.

e.g. SELECT RTRIM('Jonathan Crozier   ')

Result: Jonathan Crozier

i.e. the 3 space characters at the end of the string have been removed.

By using a combination of the above SQL functions in a script we can figure out how to split up a full name into Title, First Name and Last Name fields with a fairly high degree of accuracy.

The Script

The SQL for achieving a name split into the aforementioned fields is included below.

-- Parse the individual name parts from the full name of each customer record.
-- Update customer records for which a Title, First Name and Last Name have not already been set.
 
SET NOCOUNT ON
GO
 
-- Customer table column values.
DECLARE @CustomerId	     INT;
DECLARE @CustomerAccountName NVARCHAR(108);
DECLARE @CustomerTitle       NVARCHAR(8);
DECLARE @CustomerFirstName   NVARCHAR(50);
DECLARE @CustomerLastName    NVARCHAR(50);
 
-- Initialise cursor.
DECLARE CustomersCursor CURSOR 
    LOCAL STATIC READ_ONLY FORWARD_ONLY
FOR 
SELECT DISTINCT CustomerKey, AccountName, Title, FirstName, LastName FROM DimCustomer
 
OPEN CustomersCursor
FETCH NEXT FROM CustomersCursor 
    INTO @CustomerId, @CustomerAccountName, @CustomerTitle, @CustomerFirstName, @CustomerLastName
WHILE @@FETCH_STATUS = 0
BEGIN 
    -- Temporary variables.
    DECLARE @Title     NVARCHAR(8);
    DECLARE @FirstName NVARCHAR(50);
    DECLARE @LastName  NVARCHAR(50);
 
    -- Index and word variables.
    DECLARE @AccountNameEndIndex INT;
    DECLARE @FirstWordEndIndex   INT;
    DECLARE @SecondWordEndIndex  INT;
    DECLARE @ThirdWordEndIndex   INT;
    DECLARE @FirstWord		 NVARCHAR(8);
    DECLARE @SecondWord		 NVARCHAR(50);
    DECLARE @ThirdWord		 NVARCHAR(50);
 
    -- Get the last index.
    SET @AccountNameEndIndex = LEN(@CustomerAccountName) - 1;
    
    -- Get the first word.
    SET @FirstWordEndIndex = (SELECT CHARINDEX(' ', @CustomerAccountName, 0));
    SET @FirstWord         = SUBSTRING(@CustomerAccountName, 0, @FirstWordEndIndex);
    
    -- Get the second word.
    SET @SecondWordEndIndex = (SELECT CHARINDEX(' ', @CustomerAccountName, @FirstWordEndIndex + 1));
    
    IF @SecondWordEndIndex = 0
    	BEGIN
    	    SELECT @SecondWordEndIndex = @AccountNameEndIndex;
    
    	    SET @SecondWord = SUBSTRING(@CustomerAccountName, @FirstWordEndIndex + 1, @SecondWordEndIndex - (@FirstWordEndIndex - 1));
    	END
    ELSE
    	BEGIN
    	    SET @SecondWord = SUBSTRING(@CustomerAccountName, @FirstWordEndIndex + 1, @SecondWordEndIndex - (@FirstWordEndIndex + 1));
    	END
    
    -- Get the third word.
    SET @ThirdWordEndIndex = @AccountNameEndIndex;
    
    IF @ThirdWordEndIndex <> @SecondWordEndIndex
    	BEGIN
    	    SET @ThirdWord = SUBSTRING(@CustomerAccountName, @SecondWordEndIndex + 1, @ThirdWordEndIndex - (@SecondWordEndIndex - 1));
    	END
    
    -- Check if the first word is a 'Title'.
    SELECT @Title = CASE LTRIM(RTRIM(@FirstWord)) WHEN 'Dr'    THEN 'Dr'
    						  WHEN 'Dr.'   THEN 'Dr.'
    						  WHEN 'Miss'  THEN 'Miss'
    						  WHEN 'Miss.' THEN 'Miss.'
    						  WHEN 'Mr'    THEN 'Mr'
    						  WHEN 'Mr.'   THEN 'Mr.'
    						  WHEN 'Mrs'   THEN 'Mrs'
    						  WHEN 'Mrs.'  THEN 'Mrs.'
    						  WHEN 'Ms'    THEN 'Ms'
    						  WHEN 'Ms.'   THEN 'Ms.'
    						  WHEN 'Rev'   THEN 'Rev'
    						  WHEN 'Rev.'  THEN 'Rev.'
    						  ELSE '' 
    						  END
    
    IF @Title = ''
        BEGIN
    	    IF LTRIM(RTRIM(@FirstWord)) = ''
    	        BEGIN
    		    -- There is only one word, use this as the First Name.
    		    SELECT @FirstName = LTRIM(RTRIM(@SecondWord));
    		    SELECT @LastName  = '';
    		END
    	    ELSE
    	        BEGIN
    		    -- There are two words, set First Name and Last Name.
    		    SELECT @FirstName = LTRIM(RTRIM(@FirstWord));
    		    SELECT @LastName  = LTRIM(RTRIM(@SecondWord));
    		END
    	END
    ELSE 
        BEGIN
    	    -- There are three words, set First Name and Last Name.
    	    -- Title has already been set.
    	    SELECT @FirstName = LTRIM(RTRIM(@SecondWord));
    	    SELECT @LastName  = LTRIM(RTRIM(@ThirdWord));
    	END
    
    -- Output the combination of the name parts for reference.
    PRINT @Title + ' ' + @FirstName + ' ' + @LastName;	
    
    -- Update the current Customer values.
    IF (LEN(ISNULL(@CustomerTitle    , '')) = 0) AND 
       (LEN(ISNULL(@CustomerFirstName, '')) = 0) AND 
       (LEN(ISNULL(@CustomerLastName , '')) = 0)
       BEGIN
           UPDATE DimCustomer 
    	   SET    Title = @Title, FirstName = @FirstName, LastName = @LastName 
    	   WHERE  CustomerKey = @CustomerId;
       END
    
    FETCH NEXT FROM CustomersCursor INTO @CustomerId, @CustomerAccountName, @CustomerTitle, @CustomerFirstName, @CustomerLastName
END
CLOSE CustomersCursor
DEALLOCATE CustomersCursor

Note that you must adjust the script to suit your database schema before running it. Always back up your database before running a script for the first time and always test it on a non-production database first.

The SQL uses a CURSOR to temporarily store and iterate through the customer set. While iterating through the results the first, second and third words are parsed from the full name. The first word is then checked to see if it is a title. The individual name part variables are then set and the customer record is updated if required.

Note that the title check, of course, isn’t a perfect solution and depends on the titles being input in a consistent manner.

Script Results

Before running the script, I carried out some reverse engineering in the standard AdventureWorks database.

I added an ‘AccountName’ column and populated it based on a combination of the ‘Title’, ‘FirstName’ and ‘LastName’ columns.

After running the script and executing a select query against the ‘DimCustomer’ table, the sample output looks like the screenshot below.

Sample customer results from the AdventureWorks database
Sample customer results from the AdventureWorks database

As you can see, the script has accurately parsed the individual name parts from the ‘AccountName’ column.

In cases where there are more than three words to split, the additional words will be stored in the ‘LastName’ column.

Adapting the Script

It’s always useful to have a script like this available when the need for it arises.

Aside from column types and lengths, the initial SELECT and closing UPDATE statements are the main parts of the script that need to be changed on a case by case basis.

The script could also be extended to accommodate slightly more complex requirements.

For example, if a ‘MiddleNames’ column needed to be populated. In this case, the logic could be reworked to store any additional name parts in the ‘MiddleNames’ column instead of the ‘LastName’ column which would be reserved for the last word.

The script from this article is part of my SQL Scripts GitHub Repository where you can find other useful SQL scripts which will help you to solve other problems and save time!

Comments

This site uses Akismet to reduce spam. Learn how your comment data is processed.