135: Data Types: GUIDs Globally Unique Identifiers.

When you want to identify class instances or data records, you can’t use things like passports. Yet it’s just as important to keep track of object identities as it is for people.

One simple way to keep track of things is to just use a number. Even a simple unsigned integer should give you over 4 billion numbers. Just start at 0 and increment each time you need a new one. You can even start at one just in case you want to reserve zero to mean something special. There’s lots of problems with this approach especially when you need to coordinate unique identifiers between multiple threads or even multiple computers.

A GUID solves all these issues in a simple way that not only works now but can scale later when needed. GUID stands for globally unique identifier. and you might sometimes also see them referred to as a UUID which stands for universally unique identifier.

GUIDs are huge. They use 128 bits and an article on Wikipedia gives an idea of just how many values are possible. It says that in order for the probability of a single duplicate GUID value to reach 50%, then every person on earth would need to own 600 million GUIDs. What this means for you is that you no longer have to worry about how to ensure that your identifiers are unique. Just generate a GUID value and use it. When you need another one, you don’t increment the first one. That’s not how GUIDs work. Instead, you generate a whole new GUID each time you need one.

The only things you can meaningfully do with GUIDs are generate them and test them for equality. What I mean is that there’s no concept of one GUID being less than or greater than another GUID. They’re either equal or not. They’re not incremented like a simple number. And they don’t look like a number either. When you see a GUID displayed, it’s usually formatted as 32 hexadecimal numbers separated into 5 uneven groups with dashes between each group. Sometimes, the formatted value will appear inside curly braces, and sometimes inside quotation marks, sometimes any hexadecimal characters will be in uppercase and sometimes in lowercase. Just remember that this is just for display purposes. A GUID is still just 128 binary bits. Here’s what a GUID looks like when displayed:

65EA9162-DF2A-43BC-B691-7DFE4EF3EC63

Listen to the full episode for more on GUID or you can read the full transcript below.

Transcript

Plenty. What do you do if you want to create objects from different threads? You could have a synchronized and thread safe identity factory that hands out new numbers for any thread that needs one. But then it becomes a source of contention as threads have to wait their turn.

Maybe you let a thread ask for more than one identity number at a time. That could work. But let’s think about long term use. You’re going to want to save the state of your factory so that when the user exits the app and starts again later, then you don’t want the factory to start over at an id of one. It needs to remember which numbers have already been taken.

What happens when the user upgrades versions and the saved identity numbers get mixed up with some older copy? Actually, this can happen anytime. If the user has a backup service, then it’s always possible that some files could get restored while others remain. This can cause your application to start issuing numbers that have already been used to identify other objects and now you can’t tell them apart. If this is a large customer, the damage could be severe.

And even if you solve all these problems, you still have issues. What do you do when your application and some customer’s use becomes so big that you have to split some tasks between computers? Many large scale applications rely on this ability for reasons of scale and for redundancy. Computers are just machines that can break. If a customer relies on your application, then that customer will want to have systems in place to make sure that a single point of failure doesn’t bring down the entire application.

This all means that now your simple identity factory needs to go beyond just thread-safe and be prepared to serve identities to multiple machines. The complexity just went way up.

GUIDs solve all these issues in a simple way that not only works now but can scale later when needed. I’ll explain how right after this message from our sponsor.

First of all GUIDs are huge. They use 128 bits and an article on Wikipedia gives an idea of just how many values are possible. It says that in order for the probability of a single duplicate GUID value to reach 50%, then every person on earth would need to own 600 million GUIDs.

What this means for you is that you no longer have to worry about how to ensure that your identifiers are unique. Just generate a GUID value and use it. When you need another one, you don’t increment the first one. That’s not how GUIDs work. Instead, you generate a whole new GUID each time you need one.

Sometimes, instead of uniquely identifying each object, you want something that’s still unique to your application but that you can count on to be the same for any computer that uses your application. You can use GUIDs for this too. All you have to do is generate a GUID manually. There are even websites you can visit that will generate as many GUIDs as you need. You can then use these GUIDs in your application.

Usually, these manually generated GUIDs are used to identify a type of something instead of individual instances. Maybe you need to define and later identify messages that your app will use. You’ll need some way to tell what type of message your app received so you can look for a specific GUID.

Then if you want to store each of these messages and be able to keep them separate, well, this is getting away from the type and now needs new individual GUIDs for each message stored. These individual GUIDs will need to be generated by your application code.

You’ll usually have an operating system call you can use to create GUIDs or maybe a library. They’re a popular solution and should have wide support for generating them. You don’t need to make a web service call to some website through your code to get new GUIDs. That’s just for you to be able to easily generate GUIDs manually.

There are different versions of GUIDs. You don’t normally need to know about this to be able to use them. So I’ll just mention that some GUIDs base many of their bit values on the MAC address of your computer’s network card. And other GUIDs are generated completely from random, or more likely pseudo-random, values. Check out episode 35 about random numbers for more information.

I should mention that the only things you can meaningfully do with GUIDs are generate them and test them for equality. What I mean is that there’s no concept of one GUID being less than or greater than another GUID. They’re either equal or not. They’re not incremented like a simple number.

And they don’t look like a number either. When you see a GUID displayed, it’s usually formatted as 32 hexadecimal numbers separated into 5 uneven groups with dashes between each group. Sometimes, the formatted value will appear inside curly braces, and sometimes inside quotation marks, sometimes any hexadecimal characters will be in uppercase and sometimes in lowercase. Just remember that this is just for display purposes. A GUID is still just 128 binary bits.

You can also identify that some text represents a GUID by adding uuid: in front of the GUID. This stands for universally unique identifier. A GUID is actually just an implementation of the more general UUID standard. While a GUID is pronounceable, you’ll have to say the letters UUID.

Some programming languages will refer to GUIDs and some will refer to UUIDs.

135: Data Types: GUIDs Globally Unique Identifiers.

Transcript

Tags

Leave a ReplyCancel reply

135: Data Types: GUIDs Globally Unique Identifiers.

Transcript

Share this:

Tags

Leave a ReplyCancel reply