fbpx

It’s almost a tongue twister to say them all. Do you know what they all mean?

Let’s take URI one step at a time and at the end, you’ll understand how best to describe where information such as web pages can be found. The whole point though about this is that it applies beyond just web pages. You’ll have a solid understanding of how to describe where almost anything can be found.

The first part is Uniform. This means that you can describe completely different kinds of resources that might have very different mechanisms. Just take web pages vs. emails as an example. The URI document describes a uniform way to represent these and other types.

But types of what though? Here’s where things get really vague. We’re talking about resources and these can be almost anything. A web page is a resource. But so is an online service that provides high scores for a game. And you can even describe resources that are in the real world and have nothing to do with computers at all, such as a library book.

If you thought that was vague, the term identifier is even more so. This is whatever is needed to uniquely refer to one thing vs. something else. It can change depending on what type of thing you’re trying to identify. And it might not even be a single thing. Maybe it can identify a group of things where the group itself is important somehow. But probably the strangest part of an identifier is that there doesn’t actually have to be anything located or found at whatever is identified. It could just be an idea or a concept. All that really matters is that it has some kind of identity, whatever that means.

URIs are interpreted consistently no matter where you are but that doesn’t mean that they provide the same result. The RFC 3986 document describes http://localhost as an example. No matter where you are in the world, this always means the same thing. It refers to the computer itself. Each computer will interpret this URI to mean itself. The way that URIs are able to handle such variety is because they start out with what’s called a scheme. The “http” in http://localhost is the scheme. This says that what follows should adhere to rules specifying valid http addresses.

If you wanted to represent a phone number as a URI, then it would begin with “tel:” And an email address would begin with “mailto:”

Make sure to listen to the entire episode to understand what are URLs and URNs as well as when to use them. Or read the full transcript below.

Transcript

RFC 3986 says that:

◦ A Uniform Resource Identifier (URI) is a compact sequence of characters that identifies an abstract or physical resource.

You can read this document online by searching for RFC 3986. RFC stands for “Request for Comments” and 3986 is the specific number that the IETF has given this document. But what’s IETF? Well, if you visit www.ietf.org, you’ll find their goal is to make the internet work better. That’s a very big goal. IETF stands for the Internet Engineering Task Force and they help create standard documents that describe things like URIs.

Now you might wonder why we need such complicated documents to describe simple concepts. All I can say is that there really are a lot of small details that do need to be documented. And if small details are not documented, then they have a way of being interpreted differently each time a developer needs to work with them. In fact, even with such specific documentation, people still get confused.

Some people say that we shouldn’t worry so much and just call everything a URI. I know, I haven’t yet described what all these URLs, URNs, URIs are. I’ll get to that.

I wanted to first give you some perspective on what all this is about and where you can find out more. There’s no way that I can describe all the details in one of these official documents. At least not if I want you to stay awake. Just look at that one sentence I started out with about identifying an abstract or physical resource. Talk about some difficult reading. Instead, I’ll try my best to translate for you and explain the major points.

Let’s take this one step at a time and at the end, you’ll understand how best to describe where information such as web pages can be found. The whole point though about this is that it applies beyond just web pages. You’ll have a solid understanding of how to describe where almost anything can be found.

The first part is Uniform. This means that you can describe completely different kinds of resources that might have very different mechanisms. Just take web pages vs. emails as an example. The URI document describes a uniform way to represent these and other types.

But types of what though? Here’s where things get really vague. We’re talking about resources and these can be almost anything. A web page is a resource. But so is an online service that provides high scores for a game. And you can even describe resources that are in the real world and have nothing to do with computers at all, such as a library book.

If you thought that was vague, the term identifier is even more so. This is whatever is needed to uniquely refer to one thing vs. something else. It can change depending on what type of thing you’re trying to identify. And it might not even be a single thing. Maybe it can identify a group of things where the group itself is important somehow. But probably the strangest part of an identifier is that there doesn’t actually have to be anything located or found at whatever is identified. It could just be an idea or a concept. All that really matters is that it has some kind of identity, whatever that means.

URIs are interpreted consistently no matter where you are but that doesn’t mean that they provide the same result. The RFC document describes http://localhost as an example. No matter where you are in the world, this always means the same thing. It refers to the computer itself. Each computer will interpret this URI to mean itself.

The way that URIs are able to handle such variety is because they start out with what’s called a scheme. The “http” in http://localhost is the scheme. This says that what follows should adhere to rules specifying valid http addresses.

If you wanted to represent a phone number as a URI, then it would begin with “tel:” And an email address would begin with “mailto:”

You can think of anything following this standard as a URI. And there are more specific types of URIs that I’ll describe next. But first, let me give you an example. If you’re considering various things that can fly and let’s ignore bats for now, then you can call all of these things birds. Let’s also ignore birds that can’t actually fly such as chickens and penguins. An eagle is a bird. And so is a duck. In fact, all eagles are birds and all ducks are birds. But not all birds are eagles just like not all birds are ducks.

If you’re wondering what all these birds have to do with this episode, you can think of URIs like birds. The original way all this worked was that there were specific kinds of URIs just like there are specific kinds of birds. A URL is a type of URI just like a URN is a type of URI. Where it gets confusing though is that over the years, some of these types of URIs might actually look like and share properties from other types. Imagine a bird that looks just like an eagle and sometimes could really be an eagle but other times it could really be a duck. It’s no wonder that people get URLs mixed up with URIs nowadays.

Anyway, what makes a URL? Well, when a URI scheme contains enough information to specify how to locate some resource and the resource exists at that location, then it becomes a Uniform Resource Locator or URL.

What about URNs? This stands for Uniform Resource Name. The primary factor that determines if a URI is really a URN is if the identified resource will remain valid even after the resource no longer exists. And some URNs may not exist at all ever. They don’t have to refer to anything in particular and the scheme doesn’t have to provide a means to do anything. The only important aspect is that whatever is identified remains constant.

These concepts have changed over the years which has added to the confusion. There used to be an idea of separate URL schemes, URN schemes and URC schemes just like eagles, ducks, and sparrows are all separate types of birds. Under this old system, http was originally a URL scheme and anything that began with http must have been a URL. This is no longer the case and these separate schemes are now considered to all be URI schemes. You can’t just look at a URI anymore and say that just because it begins with http, then there must be some resource that it locates. It could be just a name instead. Or maybe it could be both a name and an actual location of a resource. All of these are URIs.

Remember that what what makes something a URL is if the scheme specifies how to locate a particular resource and that there is a resource to be located. The scheme “tel” identifies a phone number but doesn’t provide enough meaning to be able to make a phone call. At least not on a normal computer. Maybe a smart phone could consider this to be a URL. I’m not sure about that though and the answer is going to be buried deep in the RFC if at all.

The last two concepts I’ll explain are URCs and Data URIs.

URC stands for Uniform Resource Citation and originally defined schemes to retrieve metadata about some other resource. So instead of identifying a web page, you could use a URC to get the source code for that web page. Or maybe you could get author and date information about a document instead of the document itself. This type of URI is not common and I mention it mainly to be complete.

And the last concept is a Data URI. Instead of identifying where some resource can be found, a Data URI just includes the data or the information directly. There’s no need to locate or retrieve anything. Everything you need is already included right in the URI itself.

Normally, you’ll find this episode will help you to better understand the differences between URIs and URLs. You might mention to somebody that the URL is www.takeupcode.com. Hopefully, you now understand that this is not really a URI or a URL because it lacks the scheme. Now if you tell somebody that the URL is https://www.takeupcode.com then you really do have a URL because the scheme provides the information needed to know how to interpret the resource identified and there really is a web page to be found at that location. That is assuming of course that my web server is up and running. But even if my web server is down, it’s still a URL because there’s an expectation that there should be something at that location.

Now when you tell somebody that https://www.takeupcode.com is a URL, you might sometimes get a comment from the other person that, no, it’s a URI. And that we shouldn’t refer to URLs anymore. And that everything is now just URIs. Well, that’s true that it’s a URI. But it would be like pointing to a duck waddling by and telling another person, “That’s a duck.” Only to have the other person try to tell you, “No, that’s a bird.” It seems ridiculous if the other person tries to tell you that there are no more ducks in the world and that we should now refer to them all as just birds.

I say, if it’s a URL because it locates something, then call it a URL.