145: Distributed Computing: Four Reasons.

Dividing work between multiple computers is sometimes the best way to solve a problem.

We do this all the time in real life and so much that we don’t even think about it. When you get a job or when you need to hire somebody, you’re distributing the work. When a store opens in a new location, it’s distributing the work. And even if you buy a spare part for your car that’s not needed yet but will be ready if the time comes, then you’re distributing the work.

This concept can be seen in designs of products too.

Notice how large trucks have multiple wheels to spread the weight? And how they’ll have two wheels side-by-side in case one goes flat?

Houses usually have a front door and a back door. This is more for convenience than anything else but it still helps distribute the work.

Large passenger jets usually have multiple engines and the plane can continue to fly even if it loses one. If all the engines stop working at the same time, sometimes a good pilot in the right conditions can still land the plane.

All of these examples can be found in computer software. That’s because the same problems and desires we face in the real world exist in software. The only difference is that software needs more abstract thought and planning because it’s harder to imagine something that you can’t touch directly.

There’s several common reasons for distributed computing.

One is convenience.
Two is speed.
Three is reliability.
Four is feasibility.

Listen to the episode as I describe these four common reasons and give example of each. Or you can read the full transcript below.

Transcript

This concept can be seen in designs of products too.

Notice how large trucks have multiple wheels to spread the weight? And how they’ll have two wheels side-by-side in case one goes flat?

Houses usually have a front door and a back door. This is more for convenience than anything else but it still helps distribute the work.

There’s several common reasons for distributed computing.

◦ One is convenience.
◦ Two is speed.
◦ Three is reliability.
◦ Four is feasibility.

Let’s take these one at a time. This episode will be more of an overview of each of these reasons because some of these are large enough for their own episodes or even multiple episodes.

Convenience is usually for the benefit of the user. We don’t normally think of making something easy for a computer. But that can be helpful. Other easily overlooked aspects of this are conveniences for system administrators or developers or testers. Sometimes one design might be easier to explain than another and is more convenient. There’s a lot of things that can help make something convenient and for different consumers.

Medium and large size companies often need a directory to manage the identity of employees logging onto company computers. This directory can be distributed across multiple computers called domain controllers. Sometimes, changes need to be made to the directory information and it’s a lot more convenient to be able to connect to a domain controller in the same office instead of halfway around the world.

Grouping similar work on computers can also make it convenient to service those computers such as backing up their data or making configuration changes. In a way, I see this similar to batching. If you want to make ten chairs, then instead of making one chair at a time completely, you can make things a lot easier and faster by making all the legs for all ten chairs at once, then making all the cushions, etc. By choosing where you distribute each part of your application, you can group similar jobs together so the computer running them can be more specialized.

I’ll explain the other common reasons right after this message from our sponsor.

Let’s say you want to download an open source application from a website. A common practice is to use mirrors. These are not the shiny kind that we use to look at ourselves. A mirror is an alternate download site that you can use to get your application. Usually, there will be several mirror sites and each one will have a physical location listed. This is done so you can select the closest location. When you want to go to the grocery store, do you drive to a store three towns away or the one nearby? Just like how it takes time to transport groceries, it also takes time to transport files over the internet. You benefit by selecting the shortest distance and this is accomplished by distributing information to multiple locations. This helps increase the speed of your application.

It’s not only time either. The chances that errors will happen increase as the distance increases. If you drop your eggs on the way home, then you need to go back to the store to get more. And a file going over the network can also drop information. The good thing is that you usually don’t have to go back to get the entire file but just the portion that had a problem. I’ll get to reliability in a moment.

Another aspect of speed has to do with getting extra help. If you have a big job, then extra people helping you will speed things up. The same thing applies to computers. By enlisting the help of other computers, each one can perform part of the work and together, they can get the job done faster. In a way, this is a lot like multithreaded programming. Listen to episodes 92 through 106 for more information about multithreading. While multithreading makes use of multiple processing cores in the same computer, distributed computing makes use of completely separate computers. Some of the same problems and solutions apply to both but distributed computing has its own set of challenges that I’ll describe in future episodes.

I mentioned reliability just now and how shortening the distance between computers can help. That’s true but it’s not the first thing that comes to mind about distributed computing reliability. Computers can break, lose power, overheat, or crash at any time. Some other process could start running that consumes all the extra processor cycles and effectively brings a computer down. When something like this happens, it doesn’t matter how well designed your software is, or how much testing your team has done to make sure your software runs properly. When the entire computer fails, then everything on it fails too. But what if you planned for this by designing your software to run on multiple computers? Now if one of them fails, then only that part of your application has problems. And if you also designed your software to be able to use either redundant or backup machines, then when one fails, another one is ready to take over. Think of this like calling in sick for work and others on the team are ready to cover for you. That’s reliability.

The last reason I’ll explain is feasibility. This is something not often talked about. Sometimes, designing your system to use multiple distributed machines is the only way to go. Or the most cost effective. What do I mean?

Let’s say you have a really hard problem to solve. Something that’s going to require a bigger and faster computer than you have available. Maybe you consider buying a bigger and faster computer with more memory, more storage space, and faster clock speeds. One top of the line server computer like this might cost ten thousand dollars or more. What if even that’s not enough? You could spend millions on a supercomputer. But maybe there’s a better way.

Have you ever watched ants work together to cross a gap or a stream of running water? How about how ancient humans used to band together to hunt large prey? Instead of developing into a single super-sized ant or humans bigger and stronger than mammoths, a group of smaller, normal sized individuals working as a team can accomplish the task.

It’s possible to build a supercomputer with more power than the biggest and fastest single computer by joining enough ordinary desktop computers together. Sometimes, the best way to increase the capability of your application is not to run it on a bigger computer but to design it so it can run on multiple computers.

145: Distributed Computing: Four Reasons.

Transcript

Tags

Leave a ReplyCancel reply

145: Distributed Computing: Four Reasons.

Transcript

Share this:

Tags

Leave a ReplyCancel reply