205: Trees: What Can You Do With Them?

What other things can trees help you to solve? When thinking about trees, I usually put them in two groups.

The first are trees with arbitrary structure.

The second group of trees have a well defined structure.

Listen to the full episode or read the full transcript below for more details and examples. And if you’re interested to see actual code for things like this, then consider supporting the Take Up Code podcast on Patreon. Once we reach the video goal, I’ll start making short videos that show code. A tree will make for a good video and will be included at some point. Just click on the link at the top to Become a Patron. For a dollar a month, you can get access to extra podcast episodes.

Transcript

When thinking about trees, I usually put them in two groups.

All trees are composed of nodes with a clear concept of parent nodes and child nodes. One node in the tree serves as the root and has no parent. The child nodes can have their own child nodes. Some trees might be limited in how many child nodes any particular parent node can have or where nodes can be placed. It’s these limitations that I use to group trees.

The first are trees with arbitrary structure. This is like the grocery store categorization from episode 203. You get to decide where things exist in the tree. Or that can also mean that somebody using your app gets to decide on the tree structure. I mean, an outline tool wouldn’t be very useful if it came with one and only one outline built-in. The user needs to be able to create their own outlines and you may want to keep track of the data as a tree inside your app.

The benefit of this type of tree is that things stay exactly where you put them. Have you ever had a messy room but still knew where everything was? Different types of things such as papers, books, clothing, money, and even your phone are exactly where you put them. There’s no need to compare things. They are where they are and that’s all. You can put things in this type of tree when the placement needs to be completely under your control. It’s the placement that determines the parent child relationship. By placing your phone on your desk, the phone becomes a child of the desk. There’s order. But it’s only understandable to you or to the person using your app.

Usually, this type of tree is creative and allows an unlimited number of items to be placed under each node. I say under because usually when drawing a diagram of what a tree looks like, child items are drawn under the parent.

Your code will need to have some way to figure out the type of each item but the tree itself will need to treat everything the same. How can you treat everything the same? How can you treat all the things in a messy room as if they were all the same?

Simple. Just find some concept that everything shares. For the messy room, and all the things I listed, about the only thing I can think of that all these things share is that they’re all physical objects. Money might be an abstract concept but a coin or a dollar bill is a physical thing. Everything in that messy room has a name, a color, a size, a weight, and materials that it’s made from. When you want to store a variety of items like this, then you can have a class representing physical objects and it’s this class that the tree knows about.

For an outline app, each item is just a block of text. Or if you want a more full-featured outlining app where the user can add links to web pages, audio notes, pictures, etc. then you’ll want to do something similar to the messy room and create a class to represent the general concept of an outline node. It’s up to you then if you want to use inheritance or some other mechanism to represent the specific types of content that can be added to the outline.

The point for all these is that the tree holds a structure that gets built with few or no rules at all.

When I was first learning about programming and even quite a bit beyond that, this was the type of tree I was most familiar with. If you had asked me to describe a tree then, I’d probably have described this type of tree.

It’s more like a file system where you can organize your files into folders however you want. Your ability to find files later will be determined by how organized you are when creating the files in the first place. If you want to save your budget document in a folder called vacation pictures, go ahead. Good luck finding your budget later though. That’s the same thing as stuffing your phone inside your pillow.

This type of tree is also like the tree structure formed by HTML. When you want to markup some text as a quote, then you do that by placing the text inside the quote tags. This makes the text a child node of the quote node. You get the structure you want by organizing your HTML into a tree.

The second group of trees have a well defined structure. Putting something into this kind of tree is like returning a book to the library. Does the library let you put the book back on the shelves anywhere you think is a good place? No. In fact, you don’t get to put the book on the shelves at all. The librarians do this for you. When you place an item into this type of tree, it’s your code, or the tree code, that does the work of figuring out where the item will eventually be located in the tree. The tree might even need to be reorganized to make room for the new item.

These are the types of trees you’ll learn about when studying computer science. They’re also favorites for interview questions.

The benefit of this type of tree is that the organization makes it fast to find things in the tree. But it also means that there needs to be some distinguishing aspect to the things you’re putting in the tree. What I mean is this. Imagine if books had no titles and completely empty covers. And what if they were all the same size? And all had the same number of pages? What if from the outside, all books were completely the same and the only thing that was different was the words written inside? Now imagine how hard it would be to organize all the books in a library. How would you ever find the book you wanted when the only way to describe it would be to start reciting the first page or so? And how would the librarians ever hope to make sense of things?

This is why books have identifying information such as a title and author. An even better identifier is a unique number called an ISBN that’s assigned to each book.

Normally in your code, you don’t explicitly state that you want to use a tree like this. These types of trees are usually used as the basic data structure inside another class that your programming language might call a map or a dictionary. There’s other classes that will use trees too. I’m not saying that you can’t write and use a tree directly. It’s just that most languages have already done this for you when using this type of structured tree.

You need to decide what identifying characteristic of your items will be used by the tree to figure out where they should be placed in the tree. And you can only use one identifier. You can’t organize books by both the author name and the number of pages. At least not in the same tree. You could either stick the two together so the single identifying key would be something like Hogan830, or you could just use Hogan and then instead of finding a single book where Hogan exists in the tree, you would find yet another tree just for all of James P. Hogan’s books. This additional tree would be organized by total page counts.

Now, I used that last example for a couple reasons. The first is that I just found a way to include my favorite author in this podcast. He writes science fiction. The real reason though was to demonstrate that you want to be careful what you use to identify items in a tree. Do you really expect people to know how many pages are in a book and then use that to organize them?

A tree can use a single piece of information about an item to figure out where the item belongs in the tree. This is called the key and can be something like a name, or an ISBN number, or even a combination as long as that combination results in a single piece of information. You might find a more elaborate tree that claims to offer support for multiple keys but it’s probably just using a tree for each key. Like how I described the author name leading to another tree based on the page count. Once you look up the author name, you don’t get to the book yet. You find another tree where you have to then use the page count to finally look up the book.

This type of multi-key structure can lead to some interesting multi-dimensional data structures.

The important part is that a tree needs to be able to compare one key with another. Not just to see if the keys are equal but to determine order. There needs to be some way to know if one key is less than or equal to another key. For a numeric type value, the order is clear. And a key based on text such as a name can use alphabetical ordering. More complicated objects such as a book will have many different aspects that can be used for a key. It doesn’t make sense to say that one book comes before another. You need to be clear about what aspect of the book, such as the author name, will be used for the ordering. The tree will then use this ordering to figure out the best place to put the items it contains.

This also means that whatever identifier you’re using should remain constant. Don’t use a person’s age as a key or you might find that you place the person in the tree at 30 years old and then can’t find the same person a few years later when you look for somebody who’s now 35. But even worse than not being able to find the same person is the effect this could have on other people added later. The whole structure of the tree will become unusable. Instead of age, maybe the person’s birthday would work instead. The age changes over time but the birthday remains the same.

If you have an app that allows users to search for people, then you can use a tree to organize the people by whatever identifier you want. Another thing to be careful about is the range and distribution this identifier allows. A tree organized by male vs. female would not be a good choice because you’re losing all the organization benefits a tree provides though its structure. This would result in a tree with just two branches and each branch holding lots of leaves. This won’t help much when you need to find a specific leaf, or person, in this case.

This is all good. But what do you do when you want to sometimes search for a person by last name and other times by date of birth? A tree can only use a single key. You can’t combine the name and date of birth in this case because you want each to be searchable. That means you need multiple trees. Have each tree organized by whatever you want to use later to find the items in the tree.

Now, there’s something you need to watch out for. What? Well you could have a class that represents a person and then store instances of the person class in each tree. Let me explain why this is not a good idea. Each tree would use a different property of the person class as its key. There’s nothing wrong with that. But what do you do when something about the person needs to change? Maybe the address or phone number?

Wait a minute. Didn’t I just say that things couldn’t change?

Well, actually, I said that you shouldn’t change anything being used as the key. Maybe you’re not using phone numbers as the key to one of your trees. If you are, then that’s okay as long as you remove the person from the phone number tree first, then change the phone number, and then place the person instance back into the tree with the new key value.

But the bigger problem with this solution is that you would end up with the same person data copied into multiple instances of the person class all so that each instance can be inserted into separate trees. The reason that’s not a good design is because when want to change the phone number of a person, you have to look up the same person in each tree and make the same phone number change in order to keep everything up-to-date.

A better design would base the trees around another class that could refer to a person. I’m calling this class an item. You can call almost anything an item so there’s nothing special about that. The point is that whenever you want to refer to the same thing from multiple places, make sure that’s what you do and don’t duplicate things. This way, all your various trees could refer to the same people. If you use the phone number tree to find a person, then remove that referring item from the phone number tree, change the phone number, and then add the referring item back to the phone number tree, then when you use the last name tree to find the same person, you’d see that the phone number was correct. It’s correct because each of the items in all the trees refer to the same people. And there’s only a single person class with all that person’s information for each person.

205: Trees: What Can You Do With Them?

Transcript

Tags

Leave a ReplyCancel reply

205: Trees: What Can You Do With Them?

Transcript

Share this:

Tags

Leave a ReplyCancel reply