Saturday, August 15, 2015

Recursive SQL

I have been inactive on this forum for a long time now, tackled quite a bit of challenging projects and did not get to shell out time for blog updates. It is time to make up for that now. 
As a thank you note to all my followers, I am planning to explain and explore "Recursive SQL" which most candidates I have interviewed so far don't have a clear understanding of. So I presume this topic is an ideal candidate to discuss here.

What is Recursive query and when do you typically need to use it?
Recursion is a process of querying iteratively, and to perform a function multiple times on the data of interest. It comes in handy when you need to access hierarchies of data. For example, when you need to access multi-categorical data (with multiple levels), such as Home Furnishings -> Dining -> Tables. OR looking to navigate organizational structure : CEO -> Senior VP -> VP -> Senior Director etc. You get the idea?

Knowing when to make use of Recursion is nearly as important as understanding the processes of recursion itself. So, think about the result you need and list down the options you have to achieve that result. There could be other straight forward approaches to solving a problem than recursion, so choose carefully. Note that for illustration purposes, I have used a simple dataset, whose result may be obtained in different ways. I have used recursion merely to show a working example.

Components of a recursive SQL:
1. Seed 
2. Recursion
3. Termination and final output

Seed:
This is the portion of the SQL that bring the subset of data, on which recursion is to be performed.

Recursion:
Operation to be performed multiple times on the seed

Termination:
Final output after the recursion completes

Keywords:
WITH RECURSIVE (parameters that need to be passed to recursion) - to invoke recursion
UNION ALL - to combine the intermediate results together

Syntax:


WITH RECURSIVE <REC_NAME>(<COLUMNS TO BE PASSED TO RECURSION SEPARATED BY COMMA>)
AS (
--SEED
(<SEED SQL>
UNION ALL
--RECURSION
<RECURSIVE SQL CREATING INTERMEDIATE RESULTS>
)
--FINAL OUTPUT
SEL <OUTPUT COLUMNNAMES>

FROM <REC_NAME>;

Example:
Illustrating an example of going through a hierarchy from top to bottom. Home & Garden has 2 subcategories: Dining and Outdoor, who in turn have their own subcategory: table, chair, hammock. We want to navigate through each of this category from top to the lowest available level.

   Input:


   Output:


Working SQL emulating above:
--SAMPLE DATA TO TEST RECURSION
CREATE VOLATILE TABLE CATEG
(ITEM_ID INTEGER,
CATEGORY VARCHAR(100),
SUBCATEGORY VARCHAR(100)
)PRIMARY INDEX(ITEM_ID)
ON COMMIT PRESERVE ROWS;

INSERT INTO CATEG
SEL 100,'HOME & GARDEN','DINING';

INSERT INTO CATEG
SEL 101,'DINING','TABLE';

INSERT INTO CATEG
SEL 100,'HOME & GARDEN','OUTDOOR';

INSERT INTO CATEG
SEL 102,'OUTDOOR','HAMMOCK';

INSERT INTO CATEG
SEL 103,'DINING','CHAIR';

--EXECUTE TO SEE THE RESULT OF RECURSIVE SQL
WITH RECURSIVE REC_NM(CATEGORY,SUBCATEGORY, LVL) AS
--SEED
(SELECT CATEGORY,SUBCATEGORY, 1
    FROM
    CATEG
UNION ALL
--RECURSION
SELECT A.CATEGORY,A.CATEGORY||':'||A.SUBCATEGORY||':'||B.SUBCATEGORY, LVL+1
FROM
REC_NM A, CATEG B
WHERE A.SUBCATEGORY = B.CATEGORY
)
--FINAL OUTPUT
SELECT CATEGORY,SUBCATEGORY
FROM REC_NM

QUALIFY RANK() OVER(ORDER BY LVL DESC) = 1;

No comments:

Post a Comment