Nested Sets

Written by Evan Petersen on Thursday, 12 April 2012. Posted in Programming, Random

Just another Hierarchical Model

Learn how to create a hierarchical structure optimized for fetching all child nodes in a highly efficient manner. In this tutorial, we’ll look into the pros and contras of using the Nested Set model to store our hierarchical data structure.

Storing Hierarchical Structures with the Nested Set Model

Creating a model that can store hierarchical data is not a simple task.  What do I mean by hierarchical data?  Imagine that you are trying to organize products into meaningful categories.  For instance, you would have a few main categories: food, appliances, and electronics.  You then want to sub-divide those categories into more specific groups and then divide those categories into more specific groups.  This is fairly simple to map out on paper, but not so easy to place into a data table.  Such is the case when attempting to represent hierarchical data, like products and categories, in a database.  One such method for modeling this type of data is called Nested Sets.  This tutorial will cover the concepts behind the nested set model and when it should be implemented.  I'll begin by explaining the far simpler parent child model and progressing into the nested set model.


Parent-Child Model

Traditionally, hierarchical structures are modeled with a strict parent to child relationship.  In the case of a tree, each node will have a unique ID and a parent ID.  ID numbers have no meaningful relationship to the data that is contained by the node[1] other than serving as a unique identifier.  The root[2] node will have a special value as its parent ID so as to signify that you have reached the root.  Consider the following table and diagram:

nestedSetsIntro

ID

parentID

Value

0

-1

Root

1

0

A

2

0

B

3

1

C

4

1

D

5

3

E

6

2

F

 

Each arrow represents a pointer to the parent node as defined by the ID, parentID, Value table.  This parent-child relationship model is one of the most widely used and understood models for representing hierarchical structures.  Each node indicates what its ID is and who it is a child of (I.e. who its parent is). 

 

The Problem:

Unfortunately, this model becomes cumbersome when querying for ALL children of a given node.  Let’s say, for instance, that we want to know the ID of every child under node with ID 1.  By looking at our diagram, we can quickly report that nodes 3, 4, and 5 are all children of node 1.  Using our table, however, the operation isn’t nearly as easy.


nestedSetsSubQueryQuerying for Children

We begin by querying for all nodes whose parent ID is 1.  In this case, we find that nodes 3 and 4 list node 1 as their parent (the value in the red arrows signifies the query number that we are currently performing).  We then must query for all nodes that list 3 or 4 as their parent.  As you can see, this would get out of hand quickly on larger datasets, resulting in hundreds if not thousands of database calls so that we can retrieve all children of a single node. Note how the number of queries necessary directly corresponds to the depth (height) of the tree. The number of queries will always be 1+ the height of the tree that you are querying.

  

Nested Sets: our knight in shining armor!

This is where the concept of nested sets comes in.  Bear with me as it’s not as simple as the parent child relationship we discussed earlier.

Rather than have a parent ID, we will be implementing our table with left and right values.  Take a look at the table and the resulting data structure:

nestedSetsFirstNested

ID

Left

Right

Value

0

0

13

Root

1

1

8

A

2

9

12

B

3

2

5

C

4

6

7

D

5

3

4

E

6

10

11

F

 

 

Whoa, what just happened?

Alright, let’s take a step back and look at where the next set of numbers came from.

The left and right values in a node represent a set of child nodes.  In turn, each child node has a left and right value that represent another set of children, hence the term Nested Sets.  For instance, if I wanted to retrieve all children of node A (ID 1), I would run a query similar to this:

           

SELECT * FROM `nodes` WHERE Left > '1' AND Right<'8';

This would return C, D and E, in a single query.  Notice how no recursive step was applied.  In a single query, we were able to retrieve all children of A.

 

Well that’s great, but how do we figure out the left and right values for an existing tree?

Surprisingly enough, this isn’t too tricky to accomplish.  Here’s a diagram showing how I computed the left and right values of each node:

nestedSetsAssignNumbers

Determining Left and Right values

Begin at the top left arrow, starting your count from 0.  When you pass a node, increment by one and list that number next to the node.  If you are to the left of the node, the number represents the left value.  If you’re on the right, it represents the right value. 

 

 

 

 

 

 

Notice how any given leaf node’s left and right values only differ by 1 while nodes with children differ by at least 3 and then in increments of two.  This can tell us a lot about a given node without performing additional queries. 

   

Alternative Representation

Let’s look at this data structure in a way that’s a little more meaningful with respect to left and right values:

nestedSetsAlternativeRepresentation

With the nodes represented in this fashion, we can very easily see how the sets are nested and the depth of a given node (distance from the root).  The number line on the bottom can be thought of as the one dimensional space in which all nodes are placed.  Each node consumes a distance of 1 and has a buffer of at least one slot before another node is encountered.  If we were to query again for all children of node A, we can think of the query operating like so:

           

SELECT * FROM `nodes` WHERE Left > '1' AND Right<'8';

nestedSetsSubset

Inserting Nodes

Let’s try inserting a new node with value G just beneath the Root node.  Take a look at the resulting table and diagram below:

nestedSetsInsertNode

ID

Left

Right

Value

0

0

15

Root

1

1

8

A

2

9

12

B

3

2

5

C

4

6

7

D

5

3

4

E

6

10

11

F

7

13

14

G

 

To insert the node as a direct child of another node, you need to allocate some space in the tree.  To do so, retrieve the right value of the node you wish to insert under.  In this case, the right value of the root was 13 (see previous diagram).  To allocate the space, add 2 to all left and right values that are greater than or equal to 13.

           

UPDATE `nodes` SET Left = Left + '2' WHERE Left >= '13'
UPDATE `nodes` SET Right = Right + '2' WHERE Right >= '13'

Because we are inserting a new child directly below the root, our update of left values won’t affect any nodes.  (Side note, we are inserting G as the rightmost child of the Root so as to minimize the number of updates the database must perform.  Try inserting as the leftmost node instead and look at how many nodes must have their Left and Right values incremented).  After allocating the necessary space, insert your new node, setting the left value to the root’s previous right value, and your new node’s right value to 1 + the root’s previous right value. 

 

What about deleting nodes?

Deleting nodes is an incredibly simple operation; just beware of exactly what is happening.  Let’s take a look at the different cases for a delete:

 

Deleting a leaf

You will remove the leaf node and the rest of the tree will remain untouched

  1. Decrement all left values greater than the node to delete’s left value by 2
  2. Decrement all right values greater than the node to delete’s right value by 2
  3. Remove node

Deleting a node with children

You will remove the node and promote all immediate children to be direct descendants of the parent node of the node you are removing

  1. Decrement all left and right values by 1 if left value is greater than node to delete’s left value and right value is less than node to delete’s right
  2. Decrement all left values by 2 if left value is greater than node to delete’s right value.
  3. Decrement all right values by 2 if right value is greater than node to delete’s right value.
  4. Remove node

 

Deleting the Root Node

Same as deleting a Parent Node EXCEPT for the fact that it will now be possible to have multiple trees without a single root node.   If the former Root node contained more than one child, multiple nodes will become root nodes.  This is typically a bad event; however, it may be desired.

  

Sample Delete

Let’s look at what deleting a node with children (Node A index 1) looks like in our model:

nestedSetsDeleteNode

ID

Left

Right

Value

0

0

13

Root

2

7

10

B

3

1

4

C

4

5

6

D

5

2

3

E

6

8

9

F

7

11

12

G

 

Notice how C and D became children of the Root node while E remained a child of C.  If we were to continue on and delete the Root node, we would end up with 4 root nodes and 2 children (typically considered a mistake when dealing with trees, but again, it depends on what you are attempting to accomplish.)

 

Efficiency:

Nested sets are incredibly efficient for retrieving all children of a given node.  They are not efficient when deleting or inserting due to the sheer number of updates you must perform to your dataset.  While it may only be 3 SQL commands to insert, the database is updating many, many records resulting in database thrashing. 

When used in conjunction with a parent ID or a level number, nested sets can remain efficient for retrieving all immediate children of a given node; however, in true nested set form, multiple queries are required for retrieving only immediate children of a given node.  Additionally, deletes of nodes with children requires an additional SQL query to update the parent pointers if implemented alongside the traditional parent child relationship.

 

Take away

Think of Nested Sets as another model for storing hierarchical data structures.  It is highly efficient for retrieving all children of a given node and is simple to insert and remove nodes (although somewhat resource intensive).  If you happen to come across a situation where nested sets’ benefits shine, try implementing and see what kind of performance gains you achieve over the standard parent/child relationship.



[1] Node: An element that contains information pertaining to its location and contents.  E.g. parent value, identification number and human readable information.

[2] The top most node of a tree structure under which child nodes are placed.

4.7/5 rating (15 votes)

About the Author

Evan Petersen

My name is Evan Petersen and I work as Chief Technology Officer at Dotcomjungle in Ashland, Oregon. You can visit the home page of my blog at: www.EvanPetersen.com.

I enjoy reserarching new methods to solve age old problems. Hopefully you'll find something of use!

Follow me on G+

Comments (35)

  • Chase J

    Chase J

    13 December 2013 at 13:01 |
    Great article Evan. It solidified my understanding of Nested Sets , thank you very much
  • clemoni

    clemoni

    10 September 2013 at 02:26 |
    Hello Evan,

    What do you think about hierarchyid datatype in SQL server; would this answer both performance and ease of maintenance requirements?
  • Sean

    Sean

    25 August 2013 at 11:47 |
    Thanks Evan, explained this concept really clearly! Much appreciated!
  • Tom Ewald

    Tom Ewald

    08 August 2013 at 11:01 |
    Thank you for this article; you explained some aspects of nested sets better than Joe Celko does in "SQL for Smarties". One question, though: You explained the meaning of left and right values, including how to assign them, but what if you're dealing with thousands of nodes - could you give some hints at an algorithm for assigning those values? I develop in Access 2007, using SQL and VBA, in case that helps. Thanks.
    • Evan Petersen

      Evan Petersen

      08 August 2013 at 20:04 |
      Thanks for the kind words Tom - I'm glad you found the article helpful.

      In order to provide you with advice that pertains to your current situation, would you mind offering up some more information? More specifically, how are the nodes being generated? Do you currently have the tree(s) set up in another structure (E.g. Red/Black, Parent/Child etc.) or are you generating and storing the nodes on the fly. If you're simply looking to insert new nodes and build the tree from scratch, just follow the concepts outlined above.

      Step one is to determine where you want the node (if the node is going to have siblings, I would recommend adding the node as the rightmost sibling so as to minimize the number of left/right values you'll have to touch). E.g. you wish to insert under a node with left and right values of 4 and 7 respectively and therefore must set the new node's left and right values to 6 and 7.

      The next step is to allocate space for the new node by increasing all left and right values by two IF they are greater than or equal to the new nodes left/right value. (for instance, if the new node's left value is 6 and right value is 7, UPDATE table SET left = left+2 WHERE left>=6; UPDATE table SET right = right+2 WHERE right>=7;)

      Last step is to insert the new node!

      Let me know if my assumptions about your situation are correct and or whether or not this comment is helpful!
  • Karl

    Karl

    28 June 2013 at 05:25 |
    Hi Evan. Thanks for this article.

    What would you recommend for a website that has 2000-3000 users daily and when they log in they get a huge tree from which they can delete and update the hierarchy? (imagine a ToDo-application with a lot of tasks with multiple sub-tasks and so on..)

    Nested Set Model OR Parent_id ?

    Regards
    Karl
    • Evan Petersen

      Evan Petersen

      29 June 2013 at 10:23 |
      Hey Karl,

      Sounds to me like there might be a lot of add/remove operations. Nested Sets really only shines when you are working with complete sub-sets. When you want to grab all the nodes that are a child of a given node, nested sets can do it with a single simple query (keep in mind that these operations grab all nodes under a given node, not just the immediate children).

      Nested Sets require a bit of work when it comes to adding and removing nodes. On average, I would expect that it's something like a O(n) operation for a given delete/insert. Parent/child hierarchies are O(1) for an insert or delete (slightly more if you need to update the children of the node to be deleted to reference that nodes parents so that you don't end up with invalid parent pointers).

      So, like most problems in Computer Science, I would say it really depends upon the usage of your application. If you just need to find all the subtasks of a given task, use nested sets. If you need to know how they're nested, use parent_id.

      Also, I just want to say that using a tree hierarchy for the purpose of task lists is brilliant. I started writing out tasks in tree format so that I could quickly see bottlenecks in large projects.

      Regards,
      Evan
  • Ajay Patel

    Ajay Patel

    11 May 2013 at 13:16 |
    is it possible to achive more than one ROOT Node ?
    • Evan Petersen

      Evan Petersen

      13 May 2013 at 16:46 |
      It certainly is. When you're allocating space on the number line (for left and right purposes), you would assign the following left and right values accordingly:

      node a: left=0, right=1
      node b: left=2, right=3
      node c: left=4, right=5

      If you wanted to implement a child of node b, you would end up with the following values:

      node a: left=0, right=1
      node b: left=2, right=5
      node b-child: left =3, right=4
      node c: left=6, right=7

      Regards,
      Evan
  • Brian Wang

    Brian Wang

    28 March 2013 at 11:09 |
    Hi, Evan

    Thanks for this post! This is the clearest explanation on nested set model that I found on the web.

    I have 1 question though. Why can't you have gaps in the numbering? For example, during the delete, why can't you forget about updating the left and right value and accept that there is a gap in your numbering? Maybe this will cause a problem, but I fail to see it right now? This will be faster, and the next time, you might even be able to reuse the gap.
    • Evan Petersen

      Evan Petersen

      03 April 2013 at 21:58 |
      Hey Brian,

      First off, I just want to say thanks. I'm so glad that this post has helped clear things up for you.

      Secondly, you absolutely can have gaps! There will be some issues in that you can no longer rely on simple calculations to determine the total number of nodes contained within a super parent, however, it won't cause a problem in the actual process of traversing the tree. I would recommend that the tree be "re-indexed" at some point so as to "clean up" the spaces. Additionally, you could maintain a secondary table that contains information about all of the empty spaces in the number line. This way, you could take a look at the empty list first and determine whether or not you can take the place of a previously deleted node so that you don't have to go through the process of renumbering every other node. I'm not entirely convinced that this would actually help the efficiency of inserting or deleting a node, but without having run tests myself (or seen tests by others) I can't say for certain one way or the other.

      If you do happen to time the operations given the two different models, I would certainly appreciate it if you would report back!

      Regards,
      Evan
  • Chris P

    Chris P

    13 March 2013 at 23:39 |
    Amazing! just Amazing. I wish all of my cs professors can be this clear at teaching new concepts.
  • deepa ganu

    deepa ganu

    05 March 2013 at 02:02 |
    Hi Evan ,
    Instead of using the adjacency list model or nested set model should I go for a graph database
    • Evan Petersen

      Evan Petersen

      05 March 2013 at 12:57 |
      It all depends on what kind of data you are interacting with. Graphs are meant for situations in which you could potentially draw a line from point to point and end up at the point that you started on. Here is a link to Graph Theory on Wikipedia: http://en.wikipedia.org/wiki/Graph_theory

      If you are interacting with data that is a tree rather than a graph, you can take a look at an adjacency list model, nested set model, or any other tree based model. Graphs and trees are related, but are also distinctly different in their purpose and representation.

      Regards,
      Evan
  • 50r

    50r

    12 December 2012 at 09:13 |
    This is such a great article infarct i go it all from you thanks. but i have one big question and i hope it finds you well to give us a light up on this.
    how can i hold these Nested Set if i had like multiple root nods.

    Let say i have 10 category menus and they are the root nods how do i handle that.
    in this article you are demonstrating with only one root nod.
    • Evan Petersen

      Evan Petersen

      13 December 2012 at 13:47 |
      Using the nested set model, you would have a few nodes with an effective width of 1. Image for a moment that you have 3 root nodes and no child nodes. Their left and right values would be as follows: 1:(0,1) 2:(2,3) 3:(3,4)

      Because the right value equals the left value plus 1, we can assume that the node has no children.

      Having said that, I would not recommend using the nested set model for a menu system -- as mentioned in the article, nested sets are really only efficient if you are attempting to perform operations on sub-trees (I.e. does this tree contain the node x). It is not efficient for inserts or deletes (unless you decide to stray from the norm and leave holes for future inserts with a table recording where such holes exist). I would recommend that you look into the parent-child model a bit more. If your menu isn't going to be changing very often, look into caching mechanisms or alternative ways of storing your data.

      Regards,
      Evan
  • Adam

    Adam

    30 November 2012 at 09:13 |
    Evan,

    I am the Director of Web Development over at Snapshot Group in Jacksonville. I would love to meet you for a cup of coffee and discuss some coding opportunities if you're up for it.

    Let me know!

    Thanks,
    Adam
  • Brad McBride

    Brad McBride

    15 October 2012 at 06:34 |
    Thank you for this clear and concise overview. I recently started working on a project using hierarchical data and quickly realized that using the traditional model was going to cause a lot of problems. I started to investigate other ways of modeling the data and learned about Nested Sets. This article is, by far, the best introduction to this topic that I have read. Thanks to your explanations, I will be able to make the application much easier to work with both from a programmer's and user's perspective.
    • Evan Petersen

      Evan Petersen

      19 October 2012 at 17:41 |
      I'm so happy to hear that this article was helpful! I've often found tutorials/explanations of complicated topics (like that of nested sets) poorly written and/or difficult to understand -- often requiring a moderately high level of previous knowledge on the topic. When I come across a concept that I feel is lacking a well written tutorial, I take it upon myself to set the standard and lower the barrier to entry. It's comments like yours that keep me writing.

      Anyway, thanks for the feedback; it means a lot. If you come across another topic that you feel deserves a better explanation, I'd love to hear about it.
  • ahmed

    ahmed

    11 September 2012 at 11:24 |
    great explanation. wish you would post an article on sorting nested arrays.
  • Gustavo Andrade Ferreira

    Gustavo Andrade Ferreira

    21 August 2012 at 12:52 |
    What would be faster:

    - Build an adjacency list and then build the nested model from it
    - Use the direct delete and insert algorithms
    • Evan Petersen

      Evan Petersen

      21 August 2012 at 18:32 |
      At the very end I discussed the potential for including the typical parent child relationship alongside the nested set structure. The only issue with adding more reference elements would be that you have to continue to maintain them. You could after each operation completely recreate the nested set model from it as a caching mechanism but I have a feeling that it would only work well on heavily selected but not updated data. Have you implemented your proposed solution and had success? I'd be very interested in seeing some hard performance data.

      Regards,
      Evan
  • Chris

    Chris

    30 July 2012 at 01:08 |
    Very nice article. I found it to be concise and very well explained.

    As for the problem of insert/delete operations, I am wondering if efficiency might be improved by reserving node expansion space, possibly dependent on the node depth and how far away from the right the node starts at?
    • Evan Petersen

      Evan Petersen

      02 August 2012 at 11:58 |
      It's definitely a possibility, but you would probably need to keep track of where your void spaces are and how large they are. You may find that the additional overhead associated with maintaining information about voids kills any gains created by having voids. If you happen to test that theory out, would you mind reporting back here?

      Thanks,
      Evan
  • ismail

    ismail

    04 May 2012 at 11:23 |
    hey, i have a test on circuit ansyials on thrusday and im having difficulties in complex situations, for instints. can i select a node as a reference node if it is connected to a voltage source? If possible does that mean the nodal voltage of the node of the other side of the voltage source will be that of the voltage source.thanks brendon
  • Narasimhamurthy

    Narasimhamurthy

    04 May 2012 at 09:23 |
    dear dr. yaz z. lii am about to start teaching ntwroek analysis to ug students of electrical and electronics engineering. thanks for explanation and your illustrations are too good that i want to follow. kindly let me know which software is used to draw the ntwroeks and highlighting relevant portions of the ntwroek. thanks and regards.
  • Brandon K

    Brandon K

    18 April 2012 at 11:05 |
    Why not use recursive CTEs against the parent-child model?

    Keep the simple (and normalized) parent-child data structure and get the answers you need from recursive CTEs.
    • Evan Petersen

      Evan Petersen

      18 April 2012 at 16:34 |
      The parent-child data structure works wonderfully for small sets. When you start getting into high traffic websites with deep nesting (think on the order of 1000s of levels) you will run into serious performance issues especially when querying for all children of a given node.

      For most cases, a parent child relationship will be fine but will not scale. Recursive solutions will always be slower than their iterative counterpart.
  • Connie C. Khan

    Connie C. Khan

    15 April 2012 at 14:49 |
    Interesting content on the other hand I would like to explain to you that I think there is trouble with your RSS feeds when they appear to not be working for me. Might be just me but I was thinking I would suggest it.
  • Piegus

    Piegus

    14 April 2012 at 03:55 |
    Hey. What about updating sets. or moving it from one hierachy to another?
    • Evan Petersen

      Evan Petersen

      17 April 2012 at 19:53 |
      I like to think of these operations as moving items on a number line. To move one hierarchy to another location, you first need to annex some space by increasing the left and right values of every node that currently occupies that space. You are really just shifting everything over, moving the set, and then shifting it all back. I'll have to post an addendum to this article that better covers the moving of sets.

      In the mean time, does that analogy make any sense or am I talking like a crazy person?
  • Josh

    Josh

    12 April 2012 at 22:05 |
    Great article. The only thing is that I walked away from it still wondering why would I want Nested Sets? I understand that they are superior to Adjacency Lists but IMHO Nested Sets are good for querying subtrees and not much else. Querying direct children, deleting nodes, inserting nodes, and moving subtrees are all relatively difficult with Nested Sets. Not to mention that Nested Sets do not maintain referential integrity. Even Path Enumeration is easier despite having some of the same problems.

    All that said, I love your writing style and approach - I just wish this article was about Closure Tables instead of Nested Sets.
    • Evan Petersen

      Evan Petersen

      17 April 2012 at 19:56 |
      Hey, thanks for the comment! Nested sets are really good at retrieving all children of a given node. With extra information, like depth, you can make nested sets almost as efficient as parent/child relationships for retrieving information, just not inserting.

      Parent/Child relationships break down when you have to grab all children of a root node.

      Glad you enjoyed the article! I'll have to take a closer look at closure tables and possibly post about them!
  • Ryan

    Ryan

    12 April 2012 at 21:34 |
    Hey, thanks for taking the time to write this out. I've been trying to wrap my head around nested sets for some time but I never fully grasped it. Cheers!
    • Evan Petersen

      Evan Petersen

      17 April 2012 at 19:56 |
      My pleasure! Glad it was useful.

Leave a comment

You are commenting as guest. Optional login below.