XML was designed to solve two main purposes. To allow information to be stored and transported and to allow both humans and computers to read and modify the information.

Listen to the episode or read the full transcript below to learn more about how information was described and transferred before XML.

As for XML itself, let’s say you want to record a phone number in XML. You could start out with an opening tag called phone. The tag name is readable and clearly identifies the intent. It’s also surrounded by opening and closing angle brackets. You know, the less than and greater than symbols. After this opening tag and the brackets comes the actual phone number. The phone number can be formatted however you want. You could even include enough numbers to add a calling card if you want. There’s no need to worry about running out of a fixed and defined number of characters. Unless of course, that’s how you want it. It’s up to you. At the end of the phone number comes a closing tag that matches the opening tag’s name. This makes it really easy to spot where a phone number begins and where it ends.

Inside the opening tag, you can add what are called attributes. These are really nothing more than a name and a value. Maybe you could have an attribute called type and set its value to the string mobile.

Need another phone number? No problem, just start a new opening tag called phone, put any attributes that describe the phone number inside the opening tag, then put the phone number, and close it with a matching tag at the end.

And program that consumes XML doesn’t read fields located at specific starting and ending locations. Instead it looks for these tags. This means you can add new tags or even leave some out if they’re optional. You get to decide how to format your information and that format becomes part of the overall XML document.

Transcript

I remember working for a bank once a long time ago and helped develop an electronic banking application. There was a lot of information that had to be sent back and forth between the bank’s computers and the customer’s computer running the banking application.

It wasn’t much different from an ordinary online banking website that you can log into today with any bank. But back then, it was new. I want to focus the episode today on how data can be formatted and described and compare one way to do this, XML, with how I remember organizing data back then.

This is useful for distributed computing. After all, if you’re going to process information on multiple computers, you’re going to need some way to get that information from one computer to another.

This is also useful though anytime you need to organize information into a format that you can later make sense of.

Back when I was working on the banking application, there were documents that described the format of information. But let’s go even further back in time. How did businesses operate before there were any computers at all? This might seem simple but I’ll tie it all together in just a moment.

Forms on paper were used. Actually, they’re still used these days and probably will be for a long time. I’m talking about papers with boxes and labels inside the boxes that describe where to write your name, address, phone number, and if you’re buying something, then the inventory name, description, cost per item, how many you want, total cost, taxes, shipping, and grand total.

A company would send a purchase order with information like this filled out to some supplier. The supplier would need to read this and make sense of what the customer wants. Remember that the customer is sending a form designed by the customer and the supplier may be seeing this form for the very first time. The supplier might decide to go ahead and ship the purchased items along with an invoice or a bill requesting payment. This invoice would be printed on its own form.

Computers have problems understanding information printed on forms. Even we have problems sometimes. How often have you searched a bill mailed to you looking for a phone number to call to ask questions? And then, have you ever looked all over trying to find an account number or reference number?

When sending information like this, it’s important for both sides to agree on common formats. So when I was working at the bank, we used formats describing how information should be organized. This was called electronic data interchange or EDI for short. There were many standards in place and agreed on by major banks.

What I remember the most though was defining specific fields to hold information. These fields were really nothing more than a reserved number of bytes to hold text. Think of it like a paper form stretched out into one long single line. And just like how a paper form uses space for a field of information regardless of if there’s actually anything useful written there or not, the electronic fields also had to use up all the allowed space.
This meant for example that if you wanted to send a name, then there needed to be a name field of a certain maximum length. If the definition was super precise, then maybe there’d be separate first name and last name fields.
It was always a balance trying to figure out how long each field should be. Too short and information would get cut off. Too long, and there’d be a lot of wasted space. Sure some fields could be fixed length but even these could sometimes cause problems. In an effort to save a couple bytes, sometimes developers would only allow two digits for the year. That still amazes me. To think that an entire industry failed to plan for the world existing in the year 2000 and beyond.

What if you wanted to add a new field? Where should it go? If you inserted it in the middle of where other fields were already specified, then it would mess up existing communications. And putting it at the end meant that it would be far away from related information.

And there was no way to add on extra descriptive information without adding a whole new field. Let’s say you already had a phone number field. That’s good. Until different kinds of phones became common. It would be good to describe a phone number so it could be marked as a work number vs. a mobile number.

And what about multiple items? The best you could do was add a field for phone1, another field for phone2, etc. This was just another balance between adding enough fields for different phone numbers vs. wasting space.

But on top of all this, a message composed of these fields was almost impossible to read manually let alone try to change anything. If you were to manually edit the name field and added or removed just a single letter, then that could change the length of the field and throw everything off from that point on.

This is where XML comes in. Instead of defining field lengths and their order, XML lets you add tags around your information that give extra meaning.

Let’s say you want to record a phone number in XML. You could start out with an opening tag called phone. The tag name is readable and clearly identifies the intent. It’s also surrounded by opening and closing angle brackets. You know, the less than and greater than symbols. After this opening tag and the brackets comes the actual phone number. The phone number can be formatted however you want. You could even include enough numbers to add a calling card if you want. There’s no need to worry about running out of a fixed and defined number of characters. Unless of course, that’s how you want it. It’s up to you. At the end of the phone number comes a closing tag that matches the opening tag’s name. This makes it really easy to spot where a phone number begins and where it ends.

Inside the opening tag, you can add what are called attributes. These are really nothing more than a name and a value. Maybe you could have an attribute called type and set its value to the string mobile.

Need another phone number? No problem, just start a new opening tag called phone, put any attributes that describe the phone number inside the opening tag, then put the phone number, and close it with a matching tag at the end.

And program that consumes XML doesn’t read fields located at specific starting and ending locations. Instead it looks for these tags. This means you can add new tags or even leave some out if they’re optional. You get to decide how to format your information and that format becomes part of the overall XML document.