How do you correct errors once you identify them?

I’ve explained a lot already about error detection. Make sure to listen to the last several episodes for more information. Detecting errors is the first step and the most important. I mean, if you don’t know that something went wrong, then you’re not even going to try fixing it, right?

If you’re sending a file over a network and you detect an error, usually, the easiest way to correct the error is to throw away what you received and ask for the sender to retransmit the part that was bad. You shouldn’t need to start over from the beginning unless the data you’re sending is so small that it already fits in a single packet of data.

What if you can’t retransmit the data? Either because it’s gone or impractical to transmit again. Maybe there’s a single source of data that’s being sent to hundreds or thousands of recipients in real time. You can’t expect the sender to stop and retransmit information each time a recipient complains that it didn’t receive the correct information.

One thing to consider is what effect would this have on the recipient. Is it okay to just ignore the problem? If you’re sending audio, then maybe you can ignore some data without the listener even noticing. You can do this as long as you have some data before and after the error. That’s because even though you may not know the exact missing value or values, you can probably guess what the data would likely have been.

Another technique you can use to correct errors uses Hamming codes that have enough Hamming distance between them that you can guess which value was likely sent.

Make sure to listen to the full episode for examples and more explanation. There’s even an example of how our brains do error correction everytime we look at something. Or you can also read the full transcript below.

Transcript

I’ve explained a lot already about error detection. Make sure to listen to the last several episodes for more information. Detecting errors is the first step and the most important. I mean, if you don’t know that something went wrong, then you’re not even going to try fixing it, right?

If you’re sending a file over a network and you detect an error, usually, the easiest way to correct the error is to throw away what you received and ask for the sender to retransmit the part that was bad. You shouldn’t need to start over from the beginning unless the data you’re sending is so small that it already fits in a single packet of data.

What if you can’t retransmit the data? Either because it’s gone or impractical to transmit again. Maybe there’s a single source of data that’s being sent to hundreds or thousands of recipients in real time. You can’t expect the sender to stop and retransmit information each time a recipient complains that it didn’t receive the correct information.

One thing to consider is what effect would this have on the recipient. Is it okay to just ignore the problem? If you’re sending audio, then maybe you can ignore some data without the listener even noticing. You can do this as long as you have some data before and after the error. That’s because even though you may not know the exact missing value or values, you can probably guess what the data would likely have been.

Let me give you an example. Let’s say that I send you the values 1, 2, 3, then an error causes you to throw away the next value. After that, you get the values 5 and 6. This is an easy example because all the values you did get form a straight line and it’s likely that the missing value would be 4.

Just imagine listening to somebody counting and you sneeze right when they say 4. You could fill in the missing 4 with no problem. Now, it’s always possible that the other person made a mistake and counted 1, 2, 3, 8, 5, 6. And this shows that correcting errors like this can sometimes lead to mistakes. No error correction is always perfect unless you can ask for the original data to be sent again. But the results will usually be good enough.

For another example of error correction that happens all the time for each of us, just think about our blind spots. Each of our eyes has a spot where the optic nerve leaves the eye. There are no light detection cells in this spot. Yet our brains are constantly correcting these errors by filling in the missing spots based on everything around them.

One thing that makes error correction easier is that computers work with binary. At the smallest level, everything is either a zero or a one. When I was describing figuring out values by looking at other values around the error, I was talking about multiple bytes of data that take on a different meaning such as an integer. I was not talking about individual bits. It’s not likely that you’ll get a whole string of all zeros or a whole string of all ones that you can just fill in a bad bit with the same value. At the bit level, the ones and zeros could end up looking random. But even so, there are some things you can do to help correct errors.

This type of error correction involves sending more data along with the actual data. In episode 170, I described how you can send an extra parity bit that detects a single bit error. That parity bit will tell you if there’s an error somewhere in the byte. But you won’t know which of the eight bits is wrong. It could even be the parity bit that’s wrong. Now imagine stacking up eight bytes on top of one another. Just like how you stack numbers that you want to add together. Now take each column and form an imaginary byte. You can then calculate the parity for each column and send that as another byte. This is a whole extra byte of information that’s not part of the actual data. But what does it do?

It’s like cloth. You have threads woven in cloth going left and right as well as up and down. I’m sure there’s more complicated ways to weave, but all we need is a simple weave. If there’s an error in one direction, then there’s also going to be an error in the other direction. Just like how a hole in cloth affects the threads in both directions. By including parity going in the other direction, whenever you detect a parity error, all you have to do is go through the imaginary columns and check each of their parity too. When you find the column with an error, then you know which bit is wrong.

And since we’re working with binary data, all you have to do to fix the problem is flip the bit. If it was a one, then make it a zero. And if it was a zero, then make it a one.

There’s quite a few different ways to correct errors but they all involve the need to send extra information over a longer period of time. Depending on how much extra information is sent, you’ll be able to correct more errors.

Before I end this episode, I want to mention something called the Hamming distance. This is named after Richard Hamming and it’s a way of figuring out which values are far enough apart so that error correction can be applied. Think of it like this. Let’s say that you want to send a message to somebody by pointing to letters written on a piece of paper. Sometimes we may not point right at a letter especially if the letters are written too close to each other. It’s easier to point to things that are far enough from each other that the other person can tell what’s being pointer at. That way, if our pointing is just a little off, the other person can still correct the mistake by assuming that the intended letter is the closest one being pointed to.

You can achieve this result in one to two ways. Either increase the size of the paper that you’re using to write the letters on, or eliminate some of the letters so there’s not as many crowding the page.

The Hamming distance works by counting how many bits are different between two values. Let’s say you have the values 000 and 111. How many bits are different between these? Well, all three bits are different. So the Hamming distance between 000 and 111 is 3.

In order to use this system, You pick values that are far enough apart from each other to suite your needs. If you want a Hamming distance of 3 and only have 3 bits to send, then that means you need to choose 000 and 111 as the only two possible values. All the other values like 001 then become that empty space between the letters on your paper. This means that if you send 000 but the receiver reads 001 instead, then the receiver already knows that an error occurred because the received value is not one of the allowed Hamming codes. But which of the two values should the receiver choose? Well, which ever one is closest. Just calculate the Hamming distance between the value received, 001, and the two allowed values, 000, and 111. The Hamming distance between 001 and 000 is 1. And the Hamming distance between 001 and 111 is 2. Because 001 is closer to 000 than it is to 111, then the receiver can correct the error by choosing 000 as the correct value.