Performance Issue with Left Outer Join on Large Tables Involving Character Fields
Overview
When doing SQL left outer joins on a particular field that needs to have some string operations performed may considerably affect performance especially when the table involved contains high volume of records.
SQL Server Version
Microsoft SQL Server 2008
Details
In SQL, the WITH statement can be used to create a temporary query table that can be used within a given procedure:
;WITH CARDS AS (
SELECT CAST(@CARDSEQ1 AS NUMERIC(20)) AS NUM
UNION ALL
SELECT CAST(NUM + 1 AS NUMERIC(20))
FROM CARDS
WHERE NUM < @CARDSEQ2
However, we should be mindful about some potential potholes when using this temporary table within the procedure. Sometimes, even if the SQL procedure has been correctly written such that it can be executed without error, and is capable of returning the correct output, it still cannot not guarantee the effectiveness especially if it involves going through a temporary table with considerably huge volume of records.
Such is the case when trying to use a REPLICATE statement on a field from the said temporary table, as part of the WHERE clause while doing a LEFT OUTER JOIN.
To illustrate, here is a SELECT statement that will be referring to the temporary table CARDS.
SELECT ISNULL(B.ALB_Sequence_Number, 0), ISNULL(B.ALB_CARD_NUMBER, ''), ISNULL(B.LOTNUMBR, ''), ISNULL(B.DATERECD, '1900-01-01'), ISNULL(B.DTSEQNUM, 0), 0, CASE C.ITEMNMBR WHEN NULL THEN 0 ELSE 1 END, @USERID, ISNULL(B.ALB_Card_Sequence, ''), ISNULL(B.ALB_DENOMINATION, 0)
FROM CARDS A
LEFT OUTER JOIN ALBIV10003 B ON
B.ITEMNMBR = @ITEMNMBR AND B.LOCNCODE = @LOC AND
ALB_CARD_SEQUENCE = REPLICATE('0', @LENGTH - LEN(CAST(NUM AS VARCHAR(20)))) + CAST(NUM AS VARCHAR(20)) AND
B.ALB_DENOMINATION = @DENOMINATION
LEFT OUTER JOIN ALBSOP10201 C ON
C.ITEMNMBR = @ITEMNMBR AND
C.ALB_CARD_SEQUENCE = REPLICATE('0', @LENGTH - LEN(CAST(NUM AS VARCHAR(20)))) + CAST(NUM AS VARCHAR(20)) AND
C.ALB_DENOMINATION = @DENOMINATION
Notice that in the highlighted statement “REPLICATE('0', @LENGTH - LEN(CAST(NUM AS VARCHAR(20)))) + CAST(NUM AS VARCHAR(20))”, it is referencing a field NUM from the temporary table CARDS.
This script when executed where the temporary table CARDS returned approximately 500,000 records, has been running for more than ten (10) minutes or so before we decided to just terminate the process.
We then attempted to tweak the script by transferring the highlighted REPLICATE statement to the select statement on the temporary table CARDS instead of doing it on the LEFT OUTER JOIN:
;WITH CARDS AS (
SELECT CAST(@CARDSEQ1 AS NUMERIC(20)) AS NUM, CAST(REPLICATE('0', @LENGTH - LEN(CAST(@CARDSEQ1 AS VARCHAR(20)))) + CAST(@CARDSEQ1 AS VARCHAR(20)) AS VARCHAR(20)) AS TEMPCARDSEQ
UNION ALL
SELECT CAST(NUM + 1 AS NUMERIC(20)), CAST(REPLICATE('0', 12 - LEN(CAST(NUM + 1 AS VARCHAR(20)))) + CAST(CAST(NUM + 1 AS NUMERIC(20)) AS VARCHAR(20)) AS VARCHAR(20)) AS TEMPCARDSEQ
FROM CARDS
WHERE NUM < @CARDSEQ2
)
The output of this REPLICATE statement was then assigned to the field TEMPCARDSEQ. Now, in the previous select statement, we just have to replace the REPLICATE statement with the new temporary field TEMPCARDSEQ.
SELECT ISNULL(B.ALB_Sequence_Number, 0), ISNULL(B.ALB_CARD_NUMBER, ''), ISNULL(B.LOTNUMBR, ''), ISNULL(B.DATERECD, '1900-01-01'), ISNULL(B.DTSEQNUM, 0), 0, CASE C.ITEMNMBR WHEN NULL THEN 0 ELSE 1 END, @USERID, ISNULL(B.ALB_Card_Sequence, ''), ISNULL(B.ALB_DENOMINATION, 0)
FROM CARDS A
LEFT OUTER JOIN ALBIV10003 B ON B.ITEMNMBR = @ITEMNMBR AND
B.LOCNCODE = @LOC AND
ALB_CARD_SEQUENCE = A.TEMPCARDSEQ AND
B.ALB_DENOMINATION = @DENOMINATION
LEFT OUTER JOIN ALBSOP10201 C ON C.ITEMNMBR = @ITEMNMBR AND
C.ALB_CARD_SEQUENCE = A.TEMPCARDSEQ AND
C.ALB_DENOMINATION = @DENOMINATION OPTION (MAXRECURSION 0);
This modified script, when executed using the same temporary table CARDS with 500,000 records, took just 1 minute and 15 seconds to finish.
In conclusion, there was no major rewrite involved to optimize the script. There wasn’t even a change in the way the conditions nor the statements where written. What we did was just to move certain parts of the script, essentially allowing bulk of the REPLICATE operation to execute at the same time as when the temporary table CARDS is being created. This resulted to a more optimized script, ultimately improving the performance of the script.