Nested Sets
Just another Hierarchical Model
Learn how to create a hierarchical structure optimized for fetching all child nodes in a highly efficient manner. In this tutorial, we’ll look into the pros and contras of using the Nested Set model to store our hierarchical data structure.
Storing Hierarchical Structures with the Nested Set Model
Creating a model that can store hierarchical data is not a simple task. What do I mean by hierarchical data? Imagine that you are trying to organize products into meaningful categories. For instance, you would have a few main categories: food, appliances, and electronics. You then want to sub-divide those categories into more specific groups and then divide those categories into more specific groups. This is fairly simple to map out on paper, but not so easy to place into a data table. Such is the case when attempting to represent hierarchical data, like products and categories, in a database. One such method for modeling this type of data is called Nested Sets. This tutorial will cover the concepts behind the nested set model and when it should be implemented. I'll begin by explaining the far simpler parent child model and progressing into the nested set model.
Parent-Child Model
Traditionally, hierarchical structures are modeled with a strict parent to child relationship. In the case of a tree, each node will have a unique ID and a parent ID. ID numbers have no meaningful relationship to the data that is contained by the node[1] other than serving as a unique identifier. The root[2] node will have a special value as its parent ID so as to signify that you have reached the root. Consider the following table and diagram:
|
ID |
parentID |
Value |
|
0 |
-1 |
Root |
|
1 |
0 |
A |
|
2 |
0 |
B |
|
3 |
1 |
C |
|
4 |
1 |
D |
|
5 |
3 |
E |
|
6 |
2 |
F |
Each arrow represents a pointer to the parent node as defined by the ID, parentID, Value table. This parent-child relationship model is one of the most widely used and understood models for representing hierarchical structures. Each node indicates what its ID is and who it is a child of (I.e. who its parent is).
The Problem:
Unfortunately, this model becomes cumbersome when querying for ALL children of a given node. Let’s say, for instance, that we want to know the ID of every child under node with ID 1. By looking at our diagram, we can quickly report that nodes 3, 4, and 5 are all children of node 1. Using our table, however, the operation isn’t nearly as easy.
We begin by querying for all nodes whose parent ID is 1. In this case, we find that nodes 3 and 4 list node 1 as their parent (the value in the red arrows signifies the query number that we are currently performing). We then must query for all nodes that list 3 or 4 as their parent. As you can see, this would get out of hand quickly on larger datasets, resulting in hundreds if not thousands of database calls so that we can retrieve all children of a single node. Note how the number of queries necessary directly corresponds to the depth (height) of the tree. The number of queries will always be 1+ the height of the tree that you are querying.
Nested Sets: our knight in shining armor!
This is where the concept of nested sets comes in. Bear with me as it’s not as simple as the parent child relationship we discussed earlier.
Rather than have a parent ID, we will be implementing our table with left and right values. Take a look at the table and the resulting data structure:
|
ID |
Left |
Right |
Value |
|
0 |
0 |
13 |
Root |
|
1 |
1 |
8 |
A |
|
2 |
9 |
12 |
B |
|
3 |
2 |
5 |
C |
|
4 |
6 |
7 |
D |
|
5 |
3 |
4 |
E |
|
6 |
10 |
11 |
F |
Whoa, what just happened?
Alright, let’s take a step back and look at where the next set of numbers came from.
The left and right values in a node represent a set of child nodes. In turn, each child node has a left and right value that represent another set of children, hence the term Nested Sets. For instance, if I wanted to retrieve all children of node A (ID 1), I would run a query similar to this:
This would return C, D and E, in a single query. Notice how no recursive step was applied. In a single query, we were able to retrieve all children of A.
Well that’s great, but how do we figure out the left and right values for an existing tree?
Surprisingly enough, this isn’t too tricky to accomplish. Here’s a diagram showing how I computed the left and right values of each node:
Determining Left and Right values
Begin at the top left arrow, starting your count from 0. When you pass a node, increment by one and list that number next to the node. If you are to the left of the node, the number represents the left value. If you’re on the right, it represents the right value.
Notice how any given leaf node’s left and right values only differ by 1 while nodes with children differ by at least 3 and then in increments of two. This can tell us a lot about a given node without performing additional queries.
Alternative Representation
Let’s look at this data structure in a way that’s a little more meaningful with respect to left and right values:
With the nodes represented in this fashion, we can very easily see how the sets are nested and the depth of a given node (distance from the root). The number line on the bottom can be thought of as the one dimensional space in which all nodes are placed. Each node consumes a distance of 1 and has a buffer of at least one slot before another node is encountered. If we were to query again for all children of node A, we can think of the query operating like so:
Let’s try inserting a new node with value G just beneath the Root node. Take a look at the resulting table and diagram below:
|
ID |
Left |
Right |
Value |
|
0 |
0 |
15 |
Root |
|
1 |
1 |
8 |
A |
|
2 |
9 |
12 |
B |
|
3 |
2 |
5 |
C |
|
4 |
6 |
7 |
D |
|
5 |
3 |
4 |
E |
|
6 |
10 |
11 |
F |
|
7 |
13 |
14 |
G |
To insert the node as a direct child of another node, you need to allocate some space in the tree. To do so, retrieve the right value of the node you wish to insert under. In this case, the right value of the root was 13 (see previous diagram). To allocate the space, add 2 to all left and right values that are greater than or equal to 13.
Because we are inserting a new child directly below the root, our update of left values won’t affect any nodes. (Side note, we are inserting G as the rightmost child of the Root so as to minimize the number of updates the database must perform. Try inserting as the leftmost node instead and look at how many nodes must have their Left and Right values incremented). After allocating the necessary space, insert your new node, setting the left value to the root’s previous right value, and your new node’s right value to 1 + the root’s previous right value.
What about deleting nodes?
Deleting nodes is an incredibly simple operation; just beware of exactly what is happening. Let’s take a look at the different cases for a delete:
Deleting a leaf
You will remove the leaf node and the rest of the tree will remain untouched
- Decrement all left values greater than the node to delete’s left value by 2
- Decrement all right values greater than the node to delete’s right value by 2
- Remove node
Deleting a node with children
You will remove the node and promote all immediate children to be direct descendants of the parent node of the node you are removing
- Decrement all left and right values by 1 if left value is greater than node to delete’s left value and right value is less than node to delete’s right
- Decrement all left values by 2 if left value is greater than node to delete’s right value.
- Decrement all right values by 2 if right value is greater than node to delete’s right value.
- Remove node
Deleting the Root Node
Same as deleting a Parent Node EXCEPT for the fact that it will now be possible to have multiple trees without a single root node. If the former Root node contained more than one child, multiple nodes will become root nodes. This is typically a bad event; however, it may be desired.
Sample Delete
Let’s look at what deleting a node with children (Node A index 1) looks like in our model:
|
ID |
Left |
Right |
Value |
|
0 |
0 |
13 |
Root |
|
2 |
7 |
10 |
B |
|
3 |
1 |
4 |
C |
|
4 |
5 |
6 |
D |
|
5 |
2 |
3 |
E |
|
6 |
8 |
9 |
F |
|
7 |
11 |
12 |
G |
Notice how C and D became children of the Root node while E remained a child of C. If we were to continue on and delete the Root node, we would end up with 4 root nodes and 2 children (typically considered a mistake when dealing with trees, but again, it depends on what you are attempting to accomplish.)
Efficiency:
Nested sets are incredibly efficient for retrieving all children of a given node. They are not efficient when deleting or inserting due to the sheer number of updates you must perform to your dataset. While it may only be 3 SQL commands to insert, the database is updating many, many records resulting in database thrashing.
When used in conjunction with a parent ID or a level number, nested sets can remain efficient for retrieving all immediate children of a given node; however, in true nested set form, multiple queries are required for retrieving only immediate children of a given node. Additionally, deletes of nodes with children requires an additional SQL query to update the parent pointers if implemented alongside the traditional parent child relationship.
Take away
Think of Nested Sets as another model for storing hierarchical data structures. It is highly efficient for retrieving all children of a given node and is simple to insert and remove nodes (although somewhat resource intensive). If you happen to come across a situation where nested sets’ benefits shine, try implementing and see what kind of performance gains you achieve over the standard parent/child relationship.









Comments (28)
Ryan
Evan Petersen
Josh
All that said, I love your writing style and approach - I just wish this article was about Closure Tables instead of Nested Sets.
Evan Petersen
Parent/Child relationships break down when you have to grab all children of a root node.
Glad you enjoyed the article! I'll have to take a closer look at closure tables and possibly post about them!
Piegus
Evan Petersen
In the mean time, does that analogy make any sense or am I talking like a crazy person?
Connie C. Khan
Brandon K
Keep the simple (and normalized) parent-child data structure and get the answers you need from recursive CTEs.
Evan Petersen
For most cases, a parent child relationship will be fine but will not scale. Recursive solutions will always be slower than their iterative counterpart.
Narasimhamurthy
ismail
Chris
As for the problem of insert/delete operations, I am wondering if efficiency might be improved by reserving node expansion space, possibly dependent on the node depth and how far away from the right the node starts at?
Evan Petersen
Thanks,
Evan
Gustavo Andrade Ferreira
- Build an adjacency list and then build the nested model from it
- Use the direct delete and insert algorithms
Evan Petersen
Regards,
Evan
ahmed
Brad McBride
Evan Petersen
Anyway, thanks for the feedback; it means a lot. If you come across another topic that you feel deserves a better explanation, I'd love to hear about it.
Adam
I am the Director of Web Development over at Snapshot Group in Jacksonville. I would love to meet you for a cup of coffee and discuss some coding opportunities if you're up for it.
Let me know!
Thanks,
Adam
50r
how can i hold these Nested Set if i had like multiple root nods.
Let say i have 10 category menus and they are the root nods how do i handle that.
in this article you are demonstrating with only one root nod.
Evan Petersen
Because the right value equals the left value plus 1, we can assume that the node has no children.
Having said that, I would not recommend using the nested set model for a menu system -- as mentioned in the article, nested sets are really only efficient if you are attempting to perform operations on sub-trees (I.e. does this tree contain the node x). It is not efficient for inserts or deletes (unless you decide to stray from the norm and leave holes for future inserts with a table recording where such holes exist). I would recommend that you look into the parent-child model a bit more. If your menu isn't going to be changing very often, look into caching mechanisms or alternative ways of storing your data.
Regards,
Evan
deepa ganu
Instead of using the adjacency list model or nested set model should I go for a graph database
Evan Petersen
If you are interacting with data that is a tree rather than a graph, you can take a look at an adjacency list model, nested set model, or any other tree based model. Graphs and trees are related, but are also distinctly different in their purpose and representation.
Regards,
Evan
Chris P
Brian Wang
Thanks for this post! This is the clearest explanation on nested set model that I found on the web.
I have 1 question though. Why can't you have gaps in the numbering? For example, during the delete, why can't you forget about updating the left and right value and accept that there is a gap in your numbering? Maybe this will cause a problem, but I fail to see it right now? This will be faster, and the next time, you might even be able to reuse the gap.
Evan Petersen
First off, I just want to say thanks. I'm so glad that this post has helped clear things up for you.
Secondly, you absolutely can have gaps! There will be some issues in that you can no longer rely on simple calculations to determine the total number of nodes contained within a super parent, however, it won't cause a problem in the actual process of traversing the tree. I would recommend that the tree be "re-indexed" at some point so as to "clean up" the spaces. Additionally, you could maintain a secondary table that contains information about all of the empty spaces in the number line. This way, you could take a look at the empty list first and determine whether or not you can take the place of a previously deleted node so that you don't have to go through the process of renumbering every other node. I'm not entirely convinced that this would actually help the efficiency of inserting or deleting a node, but without having run tests myself (or seen tests by others) I can't say for certain one way or the other.
If you do happen to time the operations given the two different models, I would certainly appreciate it if you would report back!
Regards,
Evan
Ajay Patel
Evan Petersen
node a: left=0, right=1
node b: left=2, right=3
node c: left=4, right=5
If you wanted to implement a child of node b, you would end up with the following values:
node a: left=0, right=1
node b: left=2, right=5
node b-child: left =3, right=4
node c: left=6, right=7
Regards,
Evan