You’ll need to be able to work with groups or collections of items. A game that only has one character with one quest and with one shop that sells one item isn’t going to be very fun. You have choices and this episode continues more than a week long exploration of collection types available. You’ll learn when to use each type of collection and why. Up today is the binary tree. I’ll explain what binary trees are and then give you some guidance on when to use them.
If we add the numbers 3, 5, 1, 2, and 4 to a binary tree, it’ll look something like this. I have to break up left and right child pointer on their own line and some of the lines repeat.
root -> node(3)
node(3).left <-> node(1)
node(3).right <-> node(5)
node(1).right <-> node(2)
node(5).left <-> node(4)
What this shows is the the root of the tree is node(3) and node(3) does not point to any parent. node(3) has two children and the left and right pointers each point to a child. Each child also points back to node(3).
It also shows that node(1) has a child of its own on the right which is node(2) and that node(2) points back to node(1) for a parent. And then node(5) has a child on its left which is node(4) and node(4) points back to node(5) for its parent.
The height of a tree that is completely full is equal to the log base 2 of the number of items in the tree. We want to keep our trees as full as possible because if there are too many paths that extend farther than others, then searches down these long paths take longer than Big-O (lg N)
Listen to the full episode or read the full transcript below.
There are a lot of different kinds of trees and we’ll have future episodes that explore in detail various types of trees. I’ll explain today binary trees.
All the tree collection types get their name because the structure resembles an upside down tree. A tree starts out with a main trunk that splits into various branches. The number of branches gets larger as the branches themselves get smaller. And at the end of the branches are leaves.
Now imagine turning this upside down and replacing the trunk with a single instance of one of your classes. This is called the root. I know, that never made sense to me either. Why would there only be one root and why would it be sticking up in the air? For that matter, why is the tree upside down in the first place? All I can say is you’ll get used to it.
The root is a node just like you have nodes in a list. And like a list, this node class is designed specifically for the type of tree it belongs to. Binary trees have nodes that contain three pointers. Instead of binary tree nodes pointing to previous and next nodes like you’ll find in a double linked list, a binary tree node is arranged in a hierarchy. It has a parent pointer and two children pointers. The children pointers are called the left child and the right child.
There are two very special properties that makes a tree a binary tree. Binary means two. That’s why there’re two child nodes. If you have three, or four, or some other number of child nodes, then you don’t have a binary tree. The other special property is how the left and right child nodes relate to their parent.
The primary requirement of being able to use a binary tree is that you must be able to compare your items to see if one item is less than, or equal to, or greater than another item. If you can’t compare your items, then a binary tree just won’t work to hold those items.
If you’ve ever been in a group photo, then you know that putting shorter people behind taller people doesn’t work very well. If you’ve got enough room for two rows, then you really want to make sure that those in front are either shorter or sitting.
Binary trees get their structure by arranging new items in the left child position or the right child position based on how the new items compare with the existing items. Just like a photographer uses height to determine if a person should be in the front row or in the back row. Items in the left child position are always less than or equal to their parent. and items in the right child position are always greater than or equal to their parent.
Let’s take an example of adding the numbers one through five to a binary tree. We’ll start by mixing up the numbers so they get added in a random order and let’s say that our numbers to be added are then 3, 5, 1, 2, and 4. The first added is 3 and forms the root of the tree. Then we add 5. Since 5 is greater than 3, it goes into the tree as the right child of 3. The we add 1 which goes in as the left child of 3. At this point, the tree has three nodes with the number 3 at the root, the number 1 as the left child of 3 and the number 5 as the right child of 3. Then we add the number 2. We do the same thing by comparing 2 with the root node and since 2 is less than three, it belongs to the left. The only problem is there’s already the value of 1 sitting in this spot. That’s okay and we just repeat the process by comparing the value 2 with the value 1. Since 2 is greater than 1, we put the 2 into the tree as the right child of the 1 node. Then the last number we’ll add is 4. Since 4 is greater than 3, it goes to the right or 3. But there’s already a node with the value 5 sitting in this spot so 4 gets compared with 5 and ends up as the left child of 5.
The tree that we ended up with is not quite full because there’s a couple open spots in the 1 node and the 5 node. That’s okay. Any node in a binary tree can have zero, one, or two children. The only thing that’s important is that if there are children, then they must obey the comparison rules.
This is also called a well balanced tree because there are no branch structures that extend much farther than others. In this case, we have a path consisting of 3, 1, 2 and another path with 3, 5, 4. By path, I just mean that I started at the root and followed some child all the way until there are no more children. You can also follow paths from 3, 1 and from 3, 5 and each of these paths end at the missing child. In total we have two paths that have a length of 3 and two other paths that have a length of 2. The only time you’ll have all the paths exactly the same length is when the tree is full all the way to some level. Because the longest path of length 3 is not a lot longer than the shortest path of length 2, the tree is balanced.
By the way, a tree with just a single node as the root is also balanced. But what would have happened if we had not added the values in a random order? Let’s try that. We start with 1 and add it as the root of the tree. Then 2 gets added as the right child of 1. Then 3 gets added as the right child of 2. Then 4 gets added as the right child of 3. And finally, 5 gets added as the right child of 4. In this case, the longest path is 1, 2, 3, 4, 5 and the shortest path is just 1 because the root node 1 has no left child so that path down the left ends right away at 1. This is not a well balanced tree. It’s about to fall over.
I’ll explain how to use binary trees and how our unbalanced tree can cause problems right after this message from our sponsor.
( Message from Sponsor )
In many ways, binary trees are a lot like double linked lists. Each node in the tree points to its parent. The root node is the only node without a parent. And a node becomes a parent node whenever other nodes get added to the tree that end up being placed below a node. This arrangement causes trees to branch into a possible two paths at each node. So they’re not linear. Or at least, they shouldn’t be linear.
The unbalanced tree example shows how a tree can effectively turn into a double linked list whenever there’s just a single path that contains nodes through the entire tree. In this case, the parent pointer becomes just like the previous pointer and one or the other child nodes becomes just like the next pointer. This is bad because it ruins the most effective use of binary trees.
Remember when we wanted to add an item to a double linked list how we first had to walk the list to find the right spot to perform the insertion? This was a Big-O of N operation because we were navigating item by item through the list.
Well balanced binary trees are much better at finding things including where to make insertions. Let’s take the earlier example of the well balanced tree with the numbers 1 through 5 and add a new number 6 and count just how many nodes need to be tested. We start out as usual comparing 6 with 3 and determine that 6 belongs somewhere to the right of 3. We go to the right and find the value 5. When we compare 6 with 5, we determine that 6 should be to the right of 5 and this spot is empty so this is where we place the value 6. We did this with just 2 comparisons. Now sure, we could have gotten lucky with a double linked list but luck doesn’t count in Big-O notation. Each time you do a comparison in a binary tree and selects a side, either left or right, you effectively get to skip over all the nodes that are on the other path not taken. That’s a winning solution and will be faster. But how much faster?
If we think about the structure of a binary tree as starting with a single root node at level one and try to keep the tree completely full so as to cause the most comparisons possible when inserting a new node, then the root node can only hold two children itself. Actually, any node can only hold two children. But the two children of the root node form a whole new second level of the tree. If each of the nodes in the second level are full, then we have a full third level. Let’s stop here for a moment to count the possible number of nodes.
There’s a single node at the first root level, then 2 nodes at level 2, and 4 nodes at level 3. If each of the four nodes at level three have two children, then that makes 8 nodes on level 4. There’s a pattern here. Each level doubles the number of nodes from the previous level. This makes sense because each node can have two children on the next level. Because each level multiplies by two the number of nodes on the previous level, we have a progression that advances from one level to the next by a multiplication factor of 2. The number of levels just counts how many times we multiply by 2. This is a logarithmic progression. Any time we need to figure out where to add a new node, all we have to do is make comparisons down through the levels of the tree. How many comparisons are needed is just a count of how many times we can multiply the nodes from the previous level by 2. This means that the height of a well balanced tree grows at a rate equal to the log base 2 of the number of nodes in the tree.
So if you have a well balanced tree with one thousand nodes, then you can find any node with just ten comparisons. That’s a lot better than searching through all thousand nodes.
You’ll use a binary tree whenever you want to be able to quickly add items to your collection and then later find them again just as quickly. Adding, removing, and finding items requires Big-O of lg N time to run.
It sounds like this type of collection solves all our problems. Well, it does solve the problem of quickly adding and removing items as well as quickly finding items. But it accomplishes this by organizing the items based on their relative values. In other words, you lose the ability to remember what order items were added to the tree in the first place. Because each item gets added to a location based on its value and not based on the order it was added.
A double linked list might be slower to find items, but it can hold whatever order you want. Maybe that random order that we used to add the items to the binary tree, 3, 5, 1, 2, and 4 is important. If so, then you can’t use a binary tree because binary trees have no concept of order beyond how the tree decides to order the items based on their value.