# Nested Sets

Just another Hierarchical Model

Learn how to create a hierarchical structure optimized for fetching all child nodes in a highly efficient manner. In this tutorial, we’ll look into the pros and contras of using the Nested Set model to store our hierarchical data structure.

**Storing Hierarchical Structures with the Nested Set Model**

Creating a model that can store hierarchical data is not a simple task. What do I mean by hierarchical data? Imagine that you are trying to organize products into meaningful categories. For instance, you would have a few main categories: food, appliances, and electronics. You then want to sub-divide those categories into more specific groups and then divide those categories into more specific groups. This is fairly simple to map out on paper, but not so easy to place into a data table. Such is the case when attempting to represent hierarchical data, like products and categories, in a database. One such method for modeling this type of data is called Nested Sets. This tutorial will cover the concepts behind the nested set model and when it should be implemented. I'll begin by explaining the far simpler parent child model and progressing into the nested set model.

**Parent-Child Model**

Traditionally, hierarchical structures are modeled with a strict parent to child relationship. In the case of a tree, each node will have a unique ID and a parent ID. ID numbers have no meaningful relationship to the data that is contained by the node[1] other than serving as a unique identifier. The root[2] node will have a special value as its parent ID so as to signify that you have reached the root. Consider the following table and diagram:

ID |
parentID |
Value |

0 |
-1 |
Root |

1 |
0 |
A |

2 |
0 |
B |

3 |
1 |
C |

4 |
1 |
D |

5 |
3 |
E |

6 |
2 |
F |

Each arrow represents a pointer to the parent node as defined by the ID, parentID, Value table. This parent-child relationship model is one of the most widely used and understood models for representing hierarchical structures. Each node indicates what its ID is and who it is a child of (I.e. who its parent is).

**The Problem:**

Unfortunately, this model becomes cumbersome when querying for ALL children of a given node. Let’s say, for instance, that we want to know the ID of every child under node with ID 1. By looking at our diagram, we can quickly report that nodes 3, 4, and 5 are all children of node 1. Using our table, however, the operation isn’t nearly as easy.

We begin by querying for all nodes whose parent ID is 1. In this case, we find that nodes 3 and 4 list node 1 as their parent (the value in the red arrows signifies the query number that we are currently performing). We then must query for all nodes that list 3 or 4 as their parent. As you can see, this would get out of hand quickly on larger datasets, resulting in hundreds if not thousands of database calls so that we can retrieve all children of a single node. Note how the number of queries necessary directly corresponds to the depth (height) of the tree. The number of queries will always be 1+ the height of the tree that you are querying.

**Nested Sets: our knight in shining armor!**

This is where the concept of nested sets comes in. Bear with me as it’s not as simple as the parent child relationship we discussed earlier.

Rather than have a parent ID, we will be implementing our table with left and right values. Take a look at the table and the resulting data structure:

ID |
Left |
Right |
Value |

0 |
0 |
13 |
Root |

1 |
1 |
8 |
A |

2 |
9 |
12 |
B |

3 |
2 |
5 |
C |

4 |
6 |
7 |
D |

5 |
3 |
4 |
E |

6 |
10 |
11 |
F |

**Whoa, what just happened****?**

Alright, let’s take a step back and look at where the next set of numbers came from.

The left and right values in a node represent a set of child nodes. In turn, each child node has a left and right value that represent another set of children, hence the term Nested Sets. For instance, if I wanted to retrieve all children of node A (ID 1), I would run a query similar to this:

This would return C, D and E, in a single query. Notice how no recursive step was applied. In a single query, we were able to retrieve all children of A.

** **

**Well that’s great, but how do we figure out the left and right values for an existing tree?**

Surprisingly enough, this isn’t too tricky to accomplish. Here’s a diagram showing how I computed the left and right values of each node:

**Determining Left and Right values**

Begin at the top left arrow, starting your count from 0. When you pass a node, increment by one and list that number next to the node. If you are to the left of the node, the number represents the left value. If you’re on the right, it represents the right value.

Notice how any given leaf node’s left and right values only differ by 1 while nodes with children differ by at least 3 and then in increments of two. This can tell us a lot about a given node without performing additional queries.

**Alternative Representation**

Let’s look at this data structure in a way that’s a little more meaningful with respect to left and right values:

With the nodes represented in this fashion, we can very easily see how the sets are nested and the depth of a given node (distance from the root). The number line on the bottom can be thought of as the one dimensional space in which all nodes are placed. Each node consumes a distance of 1 and has a buffer of at least one slot before another node is encountered. If we were to query again for all children of node A, we can think of the query operating like so:

**Inserting Nodes**

Let’s try inserting a new node with value G just beneath the Root node. Take a look at the resulting table and diagram below:

ID |
Left |
Right |
Value |

0 |
0 |
15 |
Root |

1 |
1 |
8 |
A |

2 |
9 |
12 |
B |

3 |
2 |
5 |
C |

4 |
6 |
7 |
D |

5 |
3 |
4 |
E |

6 |
10 |
11 |
F |

7 |
13 |
14 |
G |

To insert the node as a direct child of another node, you need to allocate some space in the tree. To do so, retrieve the right value of the node you wish to insert under. In this case, the right value of the root was 13 (see previous diagram). To allocate the space, add 2 to all left and right values that are greater than or equal to 13.

Because we are inserting a new child directly below the root, our update of left values won’t affect any nodes. (Side note, we are inserting G as the rightmost child of the Root so as to minimize the number of updates the database must perform. Try inserting as the leftmost node instead and look at how many nodes must have their Left and Right values incremented). After allocating the necessary space, insert your new node, setting the left value to the root’s previous right value, and your new node’s right value to 1 + the root’s previous right value.

** **

**What about deleting nodes?**

Deleting nodes is an incredibly simple operation; just beware of exactly what is happening. Let’s take a look at the different cases for a delete:

**Deleting a leaf**

You will remove the leaf node and the rest of the tree will remain untouched

- Decrement all left values greater than the node to delete’s left value by 2
- Decrement all right values greater than the node to delete’s right value by 2
- Remove node

**Deleting a node with children**

You will remove the node and promote all immediate children to be direct descendants of the parent node of the node you are removing

- Decrement all left and right values by 1 if left value is greater than node to delete’s left value and right value is less than node to delete’s right
- Decrement all left values by 2 if left value is greater than node to delete’s right value.
- Decrement all right values by 2 if right value is greater than node to delete’s right value.
- Remove node

** **

**Deleting the Root Node**

Same as deleting a Parent Node EXCEPT for the fact that it will now be possible to have multiple trees without a single root node. If the former Root node contained more than one child, multiple nodes will become root nodes. This is typically a bad event; however, it may be desired.

**Sample Delete**

Let’s look at what deleting a node with children (Node A index 1) looks like in our model:

ID |
Left |
Right |
Value |

0 |
0 |
13 |
Root |

2 |
7 |
10 |
B |

3 |
1 |
4 |
C |

4 |
5 |
6 |
D |

5 |
2 |
3 |
E |

6 |
8 |
9 |
F |

7 |
11 |
12 |
G |

Notice how C and D became children of the Root node while E remained a child of C. If we were to continue on and delete the Root node, we would end up with 4 root nodes and 2 children (typically considered a mistake when dealing with trees, but again, it depends on what you are attempting to accomplish.)

** **

**Efficiency:**

Nested sets are incredibly efficient for retrieving all children of a given node. They are not efficient when deleting or inserting due to the sheer number of updates you must perform to your dataset. While it may only be 3 SQL commands to insert, the database is updating many, many records resulting in database thrashing.

When used in conjunction with a parent ID or a level number, nested sets can remain efficient for retrieving all immediate children of a given node; however, in true nested set form, multiple queries are required for retrieving only immediate children of a given node. Additionally, deletes of nodes with children requires an additional SQL query to update the parent pointers if implemented alongside the traditional parent child relationship.

** **

**Take away**

Think of Nested Sets as another model for storing hierarchical data structures. It is highly efficient for retrieving all children of a given node and is simple to insert and remove nodes (although somewhat resource intensive). If you happen to come across a situation where nested sets’ benefits shine, try implementing and see what kind of performance gains you achieve over the standard parent/child relationship.

## Comments (35)

## Chase J

## clemoni

What do you think about hierarchyid datatype in SQL server; would this answer both performance and ease of maintenance requirements?

## Sean

## Tom Ewald

## Evan Petersen

In order to provide you with advice that pertains to your current situation, would you mind offering up some more information? More specifically, how are the nodes being generated? Do you currently have the tree(s) set up in another structure (E.g. Red/Black, Parent/Child etc.) or are you generating and storing the nodes on the fly. If you're simply looking to insert new nodes and build the tree from scratch, just follow the concepts outlined above.

Step one is to determine where you want the node (if the node is going to have siblings, I would recommend adding the node as the rightmost sibling so as to minimize the number of left/right values you'll have to touch). E.g. you wish to insert under a node with left and right values of 4 and 7 respectively and therefore must set the new node's left and right values to 6 and 7.

The next step is to allocate space for the new node by increasing all left and right values by two IF they are greater than or equal to the new nodes left/right value. (for instance, if the new node's left value is 6 and right value is 7, UPDATE table SET left = left+2 WHERE left>=6; UPDATE table SET right = right+2 WHERE right>=7;)

Last step is to insert the new node!

Let me know if my assumptions about your situation are correct and or whether or not this comment is helpful!

## Karl

What would you recommend for a website that has 2000-3000 users daily and when they log in they get a huge tree from which they can delete and update the hierarchy? (imagine a ToDo-application with a lot of tasks with multiple sub-tasks and so on..)

Nested Set Model OR Parent_id ?

Regards

Karl

## Evan Petersen

Sounds to me like there might be a lot of add/remove operations. Nested Sets really only shines when you are working with complete sub-sets. When you want to grab all the nodes that are a child of a given node, nested sets can do it with a single simple query (keep in mind that these operations grab all nodes under a given node, not just the immediate children).

Nested Sets require a bit of work when it comes to adding and removing nodes. On average, I would expect that it's something like a O(n) operation for a given delete/insert. Parent/child hierarchies are O(1) for an insert or delete (slightly more if you need to update the children of the node to be deleted to reference that nodes parents so that you don't end up with invalid parent pointers).

So, like most problems in Computer Science, I would say it really depends upon the usage of your application. If you just need to find all the subtasks of a given task, use nested sets. If you need to know how they're nested, use parent_id.

Also, I just want to say that using a tree hierarchy for the purpose of task lists is brilliant. I started writing out tasks in tree format so that I could quickly see bottlenecks in large projects.

Regards,

Evan

## Ajay Patel

## Evan Petersen

node a: left=0, right=1

node b: left=2, right=3

node c: left=4, right=5

If you wanted to implement a child of node b, you would end up with the following values:

node a: left=0, right=1

node b: left=2, right=5

node b-child: left =3, right=4

node c: left=6, right=7

Regards,

Evan

## Brian Wang

Thanks for this post! This is the clearest explanation on nested set model that I found on the web.

I have 1 question though. Why can't you have gaps in the numbering? For example, during the delete, why can't you forget about updating the left and right value and accept that there is a gap in your numbering? Maybe this will cause a problem, but I fail to see it right now? This will be faster, and the next time, you might even be able to reuse the gap.

## Evan Petersen

First off, I just want to say thanks. I'm so glad that this post has helped clear things up for you.

Secondly, you absolutely can have gaps! There will be some issues in that you can no longer rely on simple calculations to determine the total number of nodes contained within a super parent, however, it won't cause a problem in the actual process of traversing the tree. I would recommend that the tree be "re-indexed" at some point so as to "clean up" the spaces. Additionally, you could maintain a secondary table that contains information about all of the empty spaces in the number line. This way, you could take a look at the empty list first and determine whether or not you can take the place of a previously deleted node so that you don't have to go through the process of renumbering every other node. I'm not entirely convinced that this would actually help the efficiency of inserting or deleting a node, but without having run tests myself (or seen tests by others) I can't say for certain one way or the other.

If you do happen to time the operations given the two different models, I would certainly appreciate it if you would report back!

Regards,

Evan

## Chris P

## deepa ganu

Instead of using the adjacency list model or nested set model should I go for a graph database

## Evan Petersen

If you are interacting with data that is a tree rather than a graph, you can take a look at an adjacency list model, nested set model, or any other tree based model. Graphs and trees are related, but are also distinctly different in their purpose and representation.

Regards,

Evan

## 50r

how can i hold these Nested Set if i had like multiple root nods.

Let say i have 10 category menus and they are the root nods how do i handle that.

in this article you are demonstrating with only one root nod.

## Evan Petersen

Because the right value equals the left value plus 1, we can assume that the node has no children.

Having said that, I would not recommend using the nested set model for a menu system -- as mentioned in the article, nested sets are really only efficient if you are attempting to perform operations on sub-trees (I.e. does this tree contain the node x). It is not efficient for inserts or deletes (unless you decide to stray from the norm and leave holes for future inserts with a table recording where such holes exist). I would recommend that you look into the parent-child model a bit more. If your menu isn't going to be changing very often, look into caching mechanisms or alternative ways of storing your data.

Regards,

Evan

## Adam

I am the Director of Web Development over at Snapshot Group in Jacksonville. I would love to meet you for a cup of coffee and discuss some coding opportunities if you're up for it.

Let me know!

Thanks,

Adam

## Brad McBride

## Evan Petersen

Anyway, thanks for the feedback; it means a lot. If you come across another topic that you feel deserves a better explanation, I'd love to hear about it.

## ahmed

## Gustavo Andrade Ferreira

- Build an adjacency list and then build the nested model from it

- Use the direct delete and insert algorithms

## Evan Petersen

Regards,

Evan

## Chris

As for the problem of insert/delete operations, I am wondering if efficiency might be improved by reserving node expansion space, possibly dependent on the node depth and how far away from the right the node starts at?

## Evan Petersen

Thanks,

Evan

## ismail

## Narasimhamurthy

## Brandon K

Keep the simple (and normalized) parent-child data structure and get the answers you need from recursive CTEs.

## Evan Petersen

For most cases, a parent child relationship will be fine but will not scale. Recursive solutions will always be slower than their iterative counterpart.

## Connie C. Khan

## Piegus

## Evan Petersen

In the mean time, does that analogy make any sense or am I talking like a crazy person?

## Josh

All that said, I love your writing style and approach - I just wish this article was about Closure Tables instead of Nested Sets.

## Evan Petersen

Parent/Child relationships break down when you have to grab all children of a root node.

Glad you enjoyed the article! I'll have to take a closer look at closure tables and possibly post about them!

## Ryan

## Evan Petersen